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PREFACE 


This book is both a tutorial and a textbook. This book presents an introduc¬ 
tion to probability and mathematical statistics and it is intended for students 
already having some elementary mathematical background. It is intended for 
a one-year junior or senior level undergraduate or beginning graduate level 
course in probability theory and mathematical statistics. The book contains 
more material than normally would be taught in a one-year course. This 
should give the teacher flexibility with respect to the selection of the content 
and level at which the book is to be used. This book is based on over 15 
years of lectures in senior level calculus based courses in probability theory 
and mathematical statistics at the University of Louisville. 

Probability theory and mathematical statistics are difficult subjects both 
for students to comprehend and teachers to explain. Despite the publication 
of a great many textbooks in this field, each one intended to provide an im¬ 
provement over the previous textbooks, this subject is still difficult to com¬ 
prehend. A good set of examples makes these subjects easy to understand. 
For this reason alone I have included more than 350 completely worked out 
examples and over 165 illustrations. I give a rigorous treatment of the fun¬ 
damentals of probability and statistics using mostly calculus. I have given 
great attention to the clarity of the presentation of the materials. In the 
text, theoretical results are presented as theorems, propositions or lemmas, 
of which as a rule rigorous proofs are given. For the few exceptions to this 
rule references are given to indicate where details can be found. This book 
contains over 450 problems of varying degrees of difficulty to help students 
master their problem solving skill. 

In many existing textbooks, the examples following the explanation of 
a topic are too few in number or too simple to obtain a through grasp of 
the principles involved. Often, in many books, examples are presented in 
abbreviated form that leaves out much material between steps, and requires 
that students derive the omitted materials themselves. As a result, students 
find examples difficult to understand. Moreover, in some textbooks, examples 
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are often worded in a confusing manner. They do not state the problem and 
then present the solution. Instead, they pass through a general discussion, 
never revealing what is to be solved for. In this book, I give many examples 
to illustrate each topic. Often we provide illustrations to promote a better 
understanding of the topic. All examples in this book are formulated as 
questions and clear and concise answers are provided in step-by-step detail. 

There are several good books on these subjects and perhaps there is 
no need to bring a new one to the market. So for several years, this was 
circulated as a series of typeset lecture notes among my students who were 
preparing for the examination 110 of the Actuarial Society of America. Many 
of my students encouraged me to formally write it as a book. Actuarial 
students will benefit greatly from this book. The book is written in simple 
English; this might be an advantage to students whose native language is not 
English. 

I cannot claim that all the materials I have written in this book are mine. 
I have learned the subject from many excellent books, such as Introduction 
to Mathematical Statistics by Hogg and Craig, and An Introduction to Prob¬ 
ability Theory and Its Applications by Feller. In fact, these books have had 
a profound impact on me, and my explanations are influenced greatly by 
these textbooks. If there are some similarities, then it is due to the fact 
that I could not make improvements on the original explanations. I am very 
thankful to the authors of these great textbooks. I am also thankful to the 
Actuarial Society of America for letting me use their test problems. I thank 
all my students in my probability theory and mathematical statistics courses 
from 1988 to 2005 who helped me in many ways to make this book possible 
in the present form. Lastly, if it weren’t for the infinite patience of my wife, 
Sadhna, this book would never get out of the hard drive of my computer. 

The author on a Macintosh computer using TjrjX, the typesetting system 
designed by Donald Knuth, typeset the entire book. The figures were gener¬ 
ated by the author using MATHEMATICA, a system for doing mathematics 
designed by Wolfram Research, and MAPLE, a system for doing mathemat¬ 
ics designed by Maplesoft. The author is very thankful to the University of 
Louisville for providing many internal financial grants while this book was 
under preparation. 


Prasanna Sahoo, Louisville 
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Chapter 1 

PROBABILITY OF EVENTS 


1.1. Introduction 

During his lecture in 1929, Bertrand Russel said, “ Probability is the most 
important concept in modern science, especially as nobody has the slightest 
notion what it means." Most people have some vague ideas about what prob¬ 
ability of an event means. The interpretation of the word probability involves 
synonyms such as chance, odds, uncertainty, prevalence, risk, expectancy etc. 
“ We use probability when we want to make an affirmation, but are not quite 
sure," writes J.R. Lucas. 

There are many distinct interpretations of the word probability. A com¬ 
plete discussion of these interpretations will take us to areas such as phi¬ 
losophy, theory of algorithm and randomness, religion, etc. Thus, we will 
only focus on two extreme interpretations. One interpretation is due to the 
so-called objective school and the other is due to the subjective school. 

The subjective school defines probabilities as subjective assignments 
based on rational thought with available information. Some subjective prob- 
abilists interpret probabilities as the degree of belief. Thus, it is difficult to 
interpret the probability of an event. 

The objective school defines probabilities to be “ long run" relative fre¬ 
quencies. This means that one should compute a probability by taking the 
number of favorable outcomes of an experiment and dividing it by total num¬ 
bers of the possible outcomes of the experiment, and then taking the limit 
as the number of trials becomes large. Some statisticians object to the word 
“long run”. The philosopher and statistician John Keynes said “in the long 
run we are all dead'. The objective school uses the theory developed by 
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Von Mises (1928) and Kolmogorov (1965). The Russian mathematician Kol¬ 
mogorov gave the solid foundation of probability theory using measure theory. 
The advantage of Kolmogorov’s theory is that one can construct probabilities 
according to the rules, compute other probabilities using axioms, and then 
interpret these probabilities. 

In this book, we will study mathematically one interpretation of prob¬ 
ability out of many. In fact, we will study probability theory based on the 
theory developed by the late Kolmogorov. There are many applications of 
probability theory. We are studying probability theory because we would 
like to study mathematical statistics. Statistics is concerned with the de¬ 
velopment of methods and their applications for collecting, analyzing and 
interpreting quantitative data in such a way that the reliability of a con¬ 
clusion based on data may be evaluated objectively by means of probability 
statements. Probability theory is used to evaluate the reliability of conclu¬ 
sions and inferences based on data. Thus, probability theory is fundamental 
to mathematical statistics. 

For an event A of a discrete sample space S, the probability of A can be 
computed by using the formula 


P{A) = 


N(A) 
N(S ) 


where N(A) denotes the number of elements of A and N(S) denotes the 
number of elements in the sample space S. For a discrete case, the probability 
of an event A can be computed by counting the number of elements in A and 
dividing it by the number of elements in the sample space S. 

In the next section, we develop various counting techniques. The branch 
of mathematics that deals with the various counting techniques is called 
combinatorics. 


1.2. Counting Techniques 

There are three basic counting techniques. They are multiplication rule, 
permutation and combination. 

1.2.1 Multiplication Rule. If Ei is an experiment with n\ outcomes 
and E 2 is an experiment with n 2 possible outcomes, then the experiment 
which consists of performing E\ first and then E 2 consists of n\n 2 possible 
outcomes. 
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Example 1.1. Find the possible number of outcomes in a sequence of two 
tosses of a fair coin. 

Answer: The number of possible outcomes is 2 • 2 = 4. This is evident from 
the following tree diagram. 



Example 1.2. Find the number of possible outcomes of the rolling of a die 
and then tossing a coin. 

Answer: Here ni = 6 and n .2 = 2. Thus by multiplication rule, the number 
of possible outcomes is 12. 



Example 1.3. How many different license plates are possible if Kentucky 
uses three letters followed by three digits. 

Answer: 

(26) 3 (10) 3 

= (17576)(1000) 

= 17,576,000. 

1.2.2. Permutation 

Consider a set of 4 objects. Suppose we want to fill 3 positions with 
objects selected from the above 4. Then the number of possible ordered 
arrangements is 24 and they are 





Probability of Events 


4 


a b c 

b a c 

c 

a b 

cl a b 

a b cl 

bad 

c 

a cl 

cl a c 

a c b 

b c a 

c 

b a 

cl b c 

a c d 

bed 

c 

b cl 

cl b a 

a d c 

b cl a 

c 

cl b 

cl c a 

a d b 

b cl c 

c 

cl a 

cl c b 


The number of possible ordered arrangements can be computed as follows: 
Since there are 3 positions and 4 objects, the first position can be filled in 
4 different ways. Once the first position is filled the remaining 2 positions 
can be filled from the remaining 3 objects. Thus, the second position can be 
filled in 3 ways. The third position can be filled in 2 ways. Then the total 
number of ways 3 positions can be filled out of 4 objects is given by 

(4) (3) (2) = 24. 

In general, if r positions are to be filled from n objects, then the total 
number of possible ways they can be filled are given by 

n(n — 1 )(n — 2) • • • (n — r + 1) 
n! 


Thus, n P r represents the number of ways r positions can be filled from n 
objects. 

Definition 1.1. Each of the n P r arrangements is called a permutation of n 
objects taken r at a time. 

Example 1.4. How many permutations are there of all three of letters a, b, 
and c? 


Pr = 


3 -* 3 


(n — r)! 



Answer: 
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Example 1.5. Find the number of permutations of n distinct objects. 

Answer: 


UMl / Tl.. 

{n — n)\ 0! 

Example 1.6. Four names are drawn from the 24 members of a club for the 
offices of President, Vice-President, Treasurer, and Secretary. In how many 
different ways can this be done? 

Answer: 

p _ (24)! 

24 4 ( 20 )! 

= (24) (23) (22) (21) 

= 255,024. 


1.2.3. Combination 

In permutation, order is important. But in many problems the order of 
selection is not important and interest centers only on the set of r objects. 

Let c denote the number of subsets of size r that can be selected from 
n different objects. The r objects in each set can be ordered in r P r ways. 
Thus we have 

n P r = c ( r Pr ) • 

From this, we get 

^ _ n^r _ Tl\ 

r P r (n — r)!r! 

The number c is denoted by ("). Thus, the above can be written as 

/ n\ n! 

\r J (n — r)\r\ 

Definition 1.2. Each of the (") unordered subsets is called a combination 
of n objects taken r at a time. 

Example 1.7. How many committees of two chemists and one physicist can 
be formed from 4 chemists and 3 physicists? 
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Answer: 



= (6) (3) 


= 18. 

Thus 18 different committees can be formed. 

1.2.4. Binomial Theorem 

We know from lower level mathematics courses that 
(x + y ) 2 = x 2 + 2 xy + y 2 



Similarly 

(x + y) 3 = x 3 + 3 x 2 y + 3 xy 2 + y 3 



In general, using induction arguments, we can show that 

(*+»r=E (?)*--v- 

k =0 ' ' 

This result is called the Binomial Theorem. The coefficient Q) is called the 
binomial coefficient. A combinatorial proof of the Binomial Theorem follows. 
If we write (x + y) n as the n times the product of the factor (x + y), that is 

(x + y) n = (x + y) (x + y) (x + y) ■ ■ ■ (x + y), 


then the coefficient of x n k y k is (?), that is the number of ways in which we 
can choose the k factors providing the y’s. 
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Remark 1.1. In 1665, Newton discovered the Binomial Series. The Binomial 
Series is given by 



where a is a real number and 

a(a — 1) (a — 2) • • • (a — k + 1) 

~k\ ' 

This (?) is called the generalized binomial coefficient. 

Now, we investigate some properties of the binomial coefficients. 

Theorem 1.1. Let n £ N (the set of natural numbers) and r = 0,1, 2,..., n. 
Then 

n 


n — r 


Proof: By direct verification, we get 


n\ 


n — rj (n — n + r)l (n — r)\ 
n\ 


?’! (n — r)! 


This theorem says that the binomial coefficients are symmetrical. 
Example 1.8. Evaluate (J) + ( 2 ) + (q). 

Answer: Since the combinations of 3 things taken 1 at a time are 3, we get 
(J) = 3. Similarly, (^) is 1. By Theorem 1, 
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Theorem 1.2. For any positive integer n and r = 1, 2, 3, n, we have 



Proof: 


(i + y) n = (i + y) (i + y ) n ~ 1 

= (i + y)” _1 + 2/(i + y)” _1 


E 

r=0 


= E 

r=0 
n— 1 

= E 

r=0 


n — 1 


n — 1 


y r + y E 




r—0 
n— 1 

E 

r—0 


n — 1 


n — 1 


y 


,r+l 


Equating the coefficients of y r from both sides of the above expression, we 
obtain 


n — 1 
r 


n — 1 
r — 1 


and the proof is now complete. 
Example 1.9. Evaluate (^) + ( 2 g 3 ) + 

Answer: 


' 24 \ 

.nr 



25! 


(14)! (11)! 
= 4,457,400. 


n 

Example 1.10. Use the Binomial Theorem to show that (-l) r 

r=0 


= o. 


Answer: Using the Binomial Theorem, we get 
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for all real numbers x. Letting x = — 1 in the above, we get 


»= t (:) (-ir- 

r—0 V ' 

Theorem 1.3. Let m and n be positive integers. Then 

(m\ ( n \ I'm + n\ 

h\r) \k-r) = \ k )■ 


Proof: 


(1 + y)"*+" = (1 + y) m (1 + y) r 


m+n 


sr:>-isOTisw' 

Equating the coefficients of y k from the both sides of the above expression, 
we obtain 


m + n 
k 


m \ i n 

0 JU 


m\ / n 

l/U-i 


and the conclusion of the theorem follows. 
Example 1.11. Show that 

2 


E 

r—0 


2 n 
n 


m\ i n 
k)\k-k 


Answer: Let k = n and m = n. Then from Theorem 3, we get 

k 


E 

r—0 
n 

E 

r—0 


m\ ( n \ f'm + n 

r ) \k — r) \ k 


n\ / n 
r ) \n — r 


2 n 
n 


E 

r—0 


n\ fn\ ( 2 n 

n 

2 n 
n 


E 

r—0 
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Theorem 1.4. Let n be a positive integer and k = 1,2,3, Then 

n— 1 

= E 


%=k — l 


m 

k - 1 


Proof: In order to establish the above identity, we use the Binomial Theorem 
together with the following result of the elementary algebra 

n —1 

x n -y n = {x-y)Y J X k y r ' 


„k„.n— 1—k 


k =0 


Note that 


= (x + 1)" — 1™ by Binomial Theorem 

n— 1 

= (x + 1 — 1) ^2 ( x + l) m by above identity 


n— 1 m 


m =0 


m 


-EE 

m=0 j—0 
n—1 m / 

= ££(•'* 

m—0 j>=0 

n n—1 

= E E 


j+i 


m 

k- 1 


x . 


fc=l m—k—l 

Hence equating the coefficient of x k , we obtain 

n— 1 / 

n \ ^ / m 

k) = ^ U-iy 

7 m=k—l 

This completes the proof of the theorem. 

The following result 


n 


Hi, Tl 2 , • ■ -, H m 


i.l u. 2 


(Xi + ^2 + ' ' ' + aJrra)” — 

ni+n 2 A -h n m =n 

is known as the multinomial theorem and it generalizes the binomial theorem. 
The sum is taken over all positive integers ni,n 2 , ..., n m such that n i + ri 2 + 
• • • + n m = n, and 

n \ n\ 


Hi, H 2 , ..., H m / ni\n 2 \ 



Probability and Mathematical Statistics 


11 


This coefficient is known as the multinomial coefficient. 

1.3. Probability Measure 

A random experiment is an experiment whose outcomes cannot be pre¬ 
dicted with certainty. However, in most cases the collection of every possible 
outcome of a random experiment can be listed. 

Definition 1.3. A sample space of a random experiment is the collection of 
all possible outcomes. 

Example 1.12. What is the sample space for an experiment in which we 
select a rat at random from a cage and determine its sex? 

Answer: The sample space of this experiment is 

S = {M, F} 

where M denotes the male rat and F denotes the female rat. 

Example 1.13. What is the sample space for an experiment in which the 
state of Kentucky picks a three digit integer at random for its daily lottery? 

Answer: The sample space of this experiment is 

S = {000,001,002,.,998,999}. 


Example 1.14. What is the sample space for an experiment in which we 
roll a pair of dice, one red and one green? 

Answer: The sample space S for this experiment is given by 


{(1,1) 

(1,2) 

(1, 

3 ) 

( 1 , 4 ) 

( 1 , 5 ) 

(1,6) 

(2,1) 

(2,2) 

(2, 

3 ) 

(2,4) 

(2,5) 

(2,6) 

( 3 , 1 ) 

(3,2) 

( 3 , 

3 ) 

(3,4) 

(3,5) 

(3,6) 

( 4 , 1 ) 

(4,2) 

( 4 , 

3 ) 

(4,4) 

(4,5) 

(4,6) 

( 5 , 1 ) 

(5,2) 

( 5 , 

3 ) 

(5,4) 

(5,5) 

(5,6) 

(6,1) 

(6,2) 

(6, 

3 ) 

(6,4) 

(6,5) 

(6,6)} 

can be written as 






S = 

{{x,y )| 

1 < 

X < 

6, 1 < 

'to 

VI 

5 * 



where x represents the number rolled on red die and y denotes the number 
rolled on green die. 
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Definition 1.4. Each element of the sample space is called a sample point. 

Definition 1.5. If the sample space consists of a countable number of sample 
points, then the sample space is said to be a countable sample space. 

Definition 1.6. If a sample space contains an uncountable number of sample 
points, then it is called a continuous sample space. 

An event A is a subset of the sample space S. It seems obvious that if A 
and B are events in sample space S, then A U B, A c , An B are also entitled 
to be events. Thus precisely we define an event as follows: 

Definition 1.7. A subset A of the sample space S is said to be an event if it 
belongs to a collection T of subsets of S satisfying the following three rules: 
(a) S € T\ (b) if A € T then A c € T\ and (c) if Aj € T for j > 1, then 
€ T . The collection T is called an event space or a cr-field. If A is the 
outcome of an experiment, then we say that the event A has occurred. 

Example 1.15. Describe the sample space of rolling a die and interpret the 
event {1,2}. 

Answer: The sample space of this experiment is 

5= {1,2,3,4,5,6}. 

The event {1,2} means getting either a 1 or a 2. 

Example 1.16. First describe the sample space of rolling a pair of dice, 
then describe the event A that the sum of numbers rolled is 7. 

Answer: The sample space of this experiment is 

S = {(x,y)\x,y=l, 2,3,4,5,6} 


and 

^4 = {(1,6), (6,1), (2,5), (5,2), (4,3), (3,4)}. 

Definition 1.8. Let S be the sample space of a random experiment. A prob¬ 
ability measure P : T —> [0,1] is a set function which assigns real numbers 
to the various events of S satisfying 

(PI) P(A) > 0 for all event A £ T, 

(P2) P(S) = 1, 
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( oo \ oo 

LM* =J2 P ( A *) 

k =1 / fc=1 

if Ai, A 2 , A 3 , A k , .are mutually disjoint events of S. 

Any set function with the above three properties is a probability measure 
for S. For a given sample space S, there may be more than one probability 
measure. The probability of an event A is the value of the probability measure 
at A, that is 

Prob(A) = P{A). 

Theorem 1.5. If 0 is a empty set (that is an impossible event), then 

P(0) = 0. 

Proof: Let A\ = S and Ai = 0 for 1 = 2,3,..., oo. Then 

OO 

s= \jAi 

i =1 

where A, t n A ? = 0 for i ^ j. By axiom 2 and axiom 3, we get 
1 = P(S) (by axiom 2) 

oo 

= P(Ai) (by axiom 3) 

i =1 

oo 

= p(A 1 ) + J2 p (A) 

i—2 

oo 

=p(s)+j2m 

i—2 

oo 

= 1 + ^P(0). 

i—2 

Therefore 

oo 

Ep(0) = °- 

i=2 

Since P(0) > 0 by axiom 1, we have 


P(0) = 0 
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and the proof of the theorem is complete. 

This theorem says that the probability of an impossible event is zero. 
Note that if the probability of an event is zero, that does not mean the event 
is empty (or impossible). There are random experiments in which there are 
infinitely many events each with probability 0. Similarly, if A is an event 
with probability 1, then it does not mean A is the sample space S. In fact 
there are random experiments in which one can find infinitely many events 
each with probability 1. 

Theorem 1.6. Let {A 1; A 2 , ..., A n } be a finite collection of n events such 
that Aj n Ej = 0 for i ^ j. Then 

( n \ n 

U-4. =5>(A). 

i—1 ) i=l 

Proof: Consider the collection {A' i }?l 1 of the subsets of the sample space S 
such that 

A[ = Ai, A' 2 = A 2 ,..., A'„ = A n 

and 

A n +i = A„_|_ 2 = A n+3 = • • • = 0. 

Hence 

'(M«) 

OO 

= E p (^) 

i= 1 

n 00 

= E>(4)+ E p (^) 

i— 1 i=n -\-1 

n 00 

= E p ( A o+ E p w 

i— 1 i=n -\-1 

n 

= Em) + 0 

i =1 
n 

= E p ( A >) 

i=l 


and the proof of the theorem is now complete. 
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When n = 2, the above theorem yields P(A\ U A 2 ) = P{A{) + P{A 2 ) 
where Ai and A 2 are disjoint (or mutually exclusive) events. 

In the following theorem, we give a method for computing probability 
of an event A by knowing the probabilities of the elementary events of the 
sample space S. 

Theorem 1.7. If A is an event of a discrete sample space S, then the 
probability of A is equal to the sum of the probabilities of its elementary 
events. 

Proof: Any set A in S can be written as the union of its singleton sets. Let 
be the collection of all the singleton sets (or the elementary events) 
of A. Then 

OO 

A=\JO t . 

i= 1 

By axiom (P3), we get 

( OO 

U °i 

i =1 

OO 

i=l 

Example 1.17. If a fair coin is tossed twice, what is the probability of 
getting at least one head? 

Answer: The sample space of this experiment is 

S = {HH, HT, TH, TT}. 

The event A is given by 

A = { at least one head } 

= { HH , HT, TH}. 

By Theorem 1.7, the probability of A is the sum of the probabilities of its 
elementary events. Thus, we get 

P(A) = P{HH) + P(HT) + P{TH) 

111 

—7+7+7 
4 4 4 

3 

~ 4' 
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Remark 1.2. Notice that here we are not computing the probability of the 
elementary events by taking the number of points in the elementary event 
and dividing by the total number of points in the sample space. We are 
using the randomness to obtain the probability of the elementary events. 
That is, we are assuming that each outcome is equally likely. This is why the 
randomness is an integral part of probability theory. 

Corollary 1.1. If 5 is a finite sample space with n sample elements and A 
is an event in S with m elements, then the probability of A is given by 



= (l + 2 + 3 + 4 + 5 + 6)fc 

_ (6)(6 + l) , 

2 

= 21fc. 



Probability and Mathematical Statistics 


17 


Using (P2), we get 


21k = 1. 


Thus k = T-. Hence, we have 

P(W) = l- 

Now, we want to find the probability of the odd number of dots turning up. 

P(odd numbered dot will turn up) = P({1}) + P({3}) + P({5}) 

1 3 5 

“ 21 + 21 + 21 
9 

~~ 21 ' 


Remark 1.3. Recall that the sum of the first n integers is equal to ^ (n + 1). 
That is, 

1 + 2 + 3 +.+ (n — 2) + (n — 1) + n = - ——-• 

This formula was first proven by Gauss (1777-1855) when he was a young 
school boy. 

Remark 1.4. Gauss proved that the sum of the first n positive integers 
is n ' " 7 * 1 when he was a school boy. Kolmogorov, the father of modern 
probability theory, proved that the sum of the first n odd positive integers is 
n 2 , when he was five years old. 


1.4. Some Properties of the Probability Measure 

Next, we present some theorems that will illustrate the various intuitive 
properties of a probability measure. 

Theorem 1.8. If A be any event of the sample space S, then 

P(A C ) = 1 - P(A) 

where A c denotes the complement of A with respect to S. 

Proof: Let A be any subset of S. Then S = A U A c . Further A and A c are 
mutually disjoint. Thus, using (P3), we get 

1 = P(S ) = P(A U A c ) 

= P(A) + P(A C ). 
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Hence, we see that 

P{A C ) = 1 -P(A). 

This completes the proof. 

Theorem 1.9. If A C B C S, then 

P(A) < P(B). 



Proof: Note that B = A U (B \ A) where B \ A denotes all the elements x 
that are in B but not in A. Further, An (B \ A) = 0. Hence by (P3), we get 

P(B) = P(AU(B\A)) 

= P(A) + P(B\A). 

By axiom (PI), we know that P(B \ A) > 0. Thus, from the above, we get 

P(B) > P{A) 

and the proof is complete. 

Theorem 1.10. If A is any event in S, then 

0 < P{A) < 1. 
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Proof: Follows from axioms (PI) and (P2) and Theorem 1.8. 
Theorem 1.10. If A and B are any two events, then 

P(A U B) = P(A) + P(B) - P(A n B). 


Proof: It is easy to see that 

A U B = A U (A c n B) 

and 

in(#nB) = 0. 


s 



Hence by (P3), we get 


P{A U B) = P(A) + P{A C n B) 
But the set B can also be written as 

B = (A n B) U (A c n B) 



( 1 . 1 ) 
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Therefore, by (P3), we get 

P(B) = P(AnB) + P(A c nB). (1.2) 

Eliminating P(A C n B) from (1.1) and (1.2), we get 

P(A U B) = P(A) + P{B) - P{A D B) 

and the proof of the theorem is now complete. 

This above theorem tells us how to calculate the probability that at least 
one of A and B occurs. 

Example 1.19. If P(A) = 0.25 and P(B) = 0.8, then show that 0.05 < 
P(AnB) < 0.25. 

Answer: Since A n B C A and A D B C J5, by Theorem 1.8, we get 
P(AnB)<P(A) and also P(AnB) < P(B). 

Hence 

P{A n B) < min{P(A), P(P)}. 

This shows that 

P(AnB) < 0.25. (1.3) 

Since A U B C S, by Theorem 1.8, we get 

P{AUB) < P(S) 

That is, by Theorem 1.10 

P(A) + P(B) - P(A n B) < P(S). 

Hence, we obtain 

0.8 + 0.25 — P{A D B) < 1 

and this yields 

0.8 + 0.25-1 < P{A n B). 

From this, we get 

0.05<P(AnH). (1.4) 
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From (1.3) and (1.4), we get 

0.05 < P(A n B) < 0.25. 

Example 1.20. Let A and B be events in a sample space S such that 
P{A) = \ = P{B) and P(A C n B c ) = §. Find P(A U B c ). 

Answer: Notice that 

AuB c = Au(A c nB c ). 

Hence, 

P(A U B c ) = P(A) + P(A C n B c ) 

1 1 

~2 + 3 
_ 5 

“ 6 ' 

Theorem 1.11. If A\ and A 2 are two events such that A\ C A 2 , then 
P(A 2 \A 1 ) = P(A a )-P(A 1 ). 

Proof: The event A 2 can be written as 

A 2 = Ai |^J(^42 \ Ai) 

where the sets A\ and A 2 \ A\ are disjoint. Hence 

P(A 2 ) = P(A 1 ) + P(A 2 \A 1 ) 

which is 

P(A 2 \A 1 ) = P(A 2 )-P(A 1 ) 

and the proof of the theorem is now complete. 

From calculus we know that a real function / : 1R —> 1R (the set of real 
numbers) is continuous on M. if and only if, for every convergent sequence 
{X n }n=l in®-, 

lim f(x n ) = f ( lim x n ) . 
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Theorem 1.12. If A x , A 2 ,..., A n ,... is a sequence of events in sample space 
S such that A 1 C A 2 C • • • C A n C • ■ then 


W IU J = lim P(A„). 

I '—' I —xv) 


Similarly, if B\. B 2 , B n ,... is a sequence of events in sample space S such 
that B\ D B 2 A ■ ■ ■ D B n D ■ • •, then 


p lp\ Bn = lim P(B n ). 

\ 1 1 n .—>00 


Proof: Given an increasing sequence of events 


A X C A 2 C ■ ■ ■ C An C 


we define a disjoint collection of events as follows: 

E x = A x 

E n = A n \A n _ 1 Vn > 2. 

Then {P n }^L 1 is a disjoint collection of events such that 


OO OO 


U a, = y p„. 

n—1 n=l 


Further 


p =p|yp, 

\n=l / \n=l 

00 


= lim y P(E n ) 

m —>-00 z ' 
n=l 

m 

= lim P(A 1 ) + y'[P(A n )-P(A„_ 1 )] 

m .— » ■ » 


= lim P(A m ) 

m —»-oo 

= lim P(A n ). 
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The second part of the theorem can be proved similarly. 
Note that 

OO 

lim A n = II A n 

n—>oo v -*' 

n— 1 

and 

OO 

lim B n = P| B n . 

n —►oo 1 1 

n =1 

Hence the results above theorem can be written as 


P ( lim An) = lim P(A n ) 

and 

P ( lim B n ) = lim P(B n ) 

and the Theorem 1.12 is called the continuity theorem for the probability 
measure. 

1.5. Review Exercises 


1. If we randomly pick two television sets in succession from a shipment of 
240 television sets of which 15 are defective, what is the probability that they 
will both be defective? 

2. A poll of 500 people determines that 382 like ice cream and 362 like cake. 
How many people like both if each of them likes at least one of the two? 
(Hint: Use P(A U B) = P(A) + P(B) - P(A n B) ). 

3. The Mathematics Department of the University of Louisville consists of 
8 professors, 6 associate professors, 13 assistant professors. In how many of 
all possible samples of size 4, chosen without replacement, will every type of 
professor be represented? 

4. A pair of dice consisting of a six-sided die and a four-sided die is rolled 
and the sum is determined. Let A be the event that a sum of 5 is rolled and 
let B be the event that a sum of 5 or a sum of 9 is rolled. Find (a) P(A), (b) 
P(B), and (c) P(A n B). 

5. A faculty leader was meeting two students in Paris, one arriving by 
train from Amsterdam and the other arriving from Brussels at approximately 
the same time. Let A and B be the events that the trains are on time, 
respectively. If P(A) = 0.93, P{B) = 0.89 and P(A n B) = 0.87, then find 
the probability that at least one train is on time. 
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6. Bill, George, and Ross, in order, roll a die. The first one to roll an even 
number wins and the game is ended. What is the probability that Bill will 
win the game? 

7. Let A and B be events such that P(A) = \ = P(B) and P(A C D B c ) = |. 
Find the probability of the event A C U B c . 


8. Suppose a box contains 4 blue, 5 white, 6 red and 7 green balls. In how 
many of all possible samples of size 5, chosen without replacement, will every 
color be represented? 


9. Using the Binomial Theorem, show that 


k =0 


= nT 


10 . A function consists of a domain A , a co-domain B and a rule /. The 
rule / assigns to each number in the domain A one and only one letter in the 
co-domain B. If A = {1,2,3} and B = {x, y, z, w}, then find all the distinct 
functions that can be formed from the set A into the set B. 


11. Let S be a countable sample space. Let {Oi}^l 1 be the collection of all 
the elementary events in S. What should be the value of the constant c such 
that P(Oi) = c (D* will be a probability measure in S ? 

12. A box contains five green balls, three black balls, and seven red balls. 
Two balls are selected at random without replacement from the box. What 
is the probability that both balls are the same color? 

13. Find the sample space of the random experiment which consists of tossing 
a coin until the first head is obtained. Is this sample space discrete? 

14. Find the sample space of the random experiment which consists of tossing 
a coin infinitely many times. Is this sample space discrete? 

15. Five fair dice are thrown. What is the probability that a full house is 
thrown (that is, where two dice show one number and other three dice show 
a second number)? 

16. If a fair coin is tossed repeatedly, what is the probability that the third 
head occurs on the n th toss? 


17. In a particular softball league each team consists of 5 women and 5 
men. In determining a batting order for 10 players, a woman must bat first, 
and successive batters must be of opposite sex. How many different batting 
orders are possible for a team? 
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18. An urn contains 3 red balls, 2 green balls and 1 yellow ball. Three balls 
are selected at random and without replacement from the urn. What is the 
probability that at least 1 color is not drawn? 

19. A box contains four $10 bills, six $5 bills and two $1 bills. Two bills are 
taken at random from the box without replacement. What is the probability 
that both bills will be of the same denomination? 

20. An urn contains n white counters numbered 1 through n, n black coun¬ 
ters numbered 1 through n, and n red counter numbered 1 through n. If 
two counters are to be drawn at random without replacement, what is the 
probability that both counters will be of the same color or bear the same 
number? 

21. Two people take turns rolling a fair die. Person X rolls first, then 
person Y, then X , and so on. The winner is the first to roll a 6. What is the 
probability that person X wins? 

22. Mr. Flowers plants 10 rose bushes in a row. Eight of the bushes are 
white and two are red, and he plants them in random order. What is the 
probability that he will consecutively plant seven or more white bushes? 

23. Using mathematical induction, show that 


d n 

dx n 


I/O) • 9(x)\ 


n rfk jn-k 

u=n v / 



Probability of Events 


26 



Probability and Mathematical Statistics 


27 


Chapter 2 

CONDITIONAL 

PROBABILITIES 

AND 

BAYES’ THEOREM 


2.1. Conditional Probabilities 

First, we give a heuristic argument for the definition of conditional prob¬ 
ability, and then based on our heuristic argument, we define the conditional 
probability. 

Consider a random experiment whose sample space is S. Let B C S. 
In many situations, we are only concerned with those outcomes that are 
elements of B. This means that we consider B to be our new sample space. 



For the time being, suppose S is a nonempty finite sample space and B is 
a nonempty subset of S. Given this new discrete sample space B , how do 
we define the probability of an event A? Intuitively, one should define the 
probability of A with respect to the new sample space B as (see the figure 
above) 

„. . the number of elements in A n B 

P(A given B) = 


the number of elements in B 
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We denote the conditional probability of A given the new sample space B as 
P(A/B). Hence with this notation, we say that 


P{A/B) = 


N{AnB) 

N(B) 

P(AnB) 

P(B) 


since N(S) 0. Here N(S) denotes the number of elements in S. 


Thus, if the sample space is finite, then the above definition of the prob¬ 
ability of an event A given that the event B has occurred makes sense in¬ 
tuitively. Now we define the conditional probability for any sample space 
(discrete or continuous) as follows. 


Definition 2.1. Let 5 be a sample space associated with a random exper¬ 
iment. The conditional probability of an event A, given that event B has 
occurred, is defined by 


P(A/B) = 


P(A n B) 
~P(B) 


provided P(B) > 0. 


This conditional probability measure P(A/B) satisfies all three axioms 
of a probability measure. That is, 

(CPI) P(A/B) > 0 for all event A 
(CP2) P(B/B) = 1 

(CP3) If Hi, A 2 ,..., Ak ,... are mutually exclusive events, then 

OO OO 

P{[jA k /B) = Y J P{A k /B). 

k =1 k=1 


Thus, it is a probability measure with respect to the new sample space B. 

Example 2.1. A drawer contains 4 black, 6 brown, and 8 olive socks. Two 
socks are selected at random from the drawer, (a) What is the probability 
that both socks are of the same color? (b) What is the probability that both 
socks are olive if it is known that they are of the same color? 

Answer: The sample space of this experiment consists of 


S={(x,y)\x,yeBl, Ol, Br}. 
The cardinality of S is 


N (S) = = 153 . 
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Let A be the event that two socks selected at random are of the same color. 
Then the cardinality of A is given by 



= 6 + 15 + 28 


= 49. 


Therefore, the probability of A is given by 


49 


49 

153' 


Let B be the event that two socks selected at random are olive. Then the 
cardinality of B is given by 



and hence 


P{B) 


Notice that B c A. Hence, 


_© = ™_ 
O 153 ' 


P{B/A) 


P(A(1B) 

P(A) 

P(B) 

P{A) 



28 _ 4 
49 ~ 7' 


Let A and B be two mutually disjoint events in a sample space S. We 
want to find a formula for computing the probability that the event A occurs 
before the event B in a sequence trials. Let P(A) and P(B) be the probabil¬ 
ities that A and B occur, respectively. Then the probability that neither A 
nor B occurs is 1 — P(A) — P(B). Let us denote this probability by r, that 
is r = 1 — P{A) - P{B). 

In the first trial, either A occurs, or B occurs, or neither A nor B occurs. 
In the first trial if A occurs, then the probability of A occurs before B is 1. 
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If B occurs in the first trial, then the probability of A occurs before B is 0. 
If neither A nor B occurs in the first trial, we look at the outcomes of the 
second trial. In the second trial if A occurs, then the probability of A occurs 
before B is 1. If B occurs in the second trial, then the probability of A occurs 
before B is 0. If neither A nor B occurs in the second trial, we look at the 
outcomes of the third trial, and so on. This argument can be summarized in 
the following diagram. 



Hence the probability that the event A comes before the event B is given by 


P(A before B) = P(A) + r P{A) + r 2 P{A) + r 3 P{A) + • • • + r n P(A) + 

= P(A) [1 + r + r 2 H - hr”H - ] 

1 


= P(A) 
= P(A) 


1 — r 


1 


1 - [1 - P(A) - P(B)] 
P(A) 

P(A) + P(B)' 


The event A before B can also be interpreted as a conditional event. In 
this interpretation the event A before B means the occurrence of the event 
A given that A U B has already occurred. Thus we again have 


P(A/A U B) 


P(ln(duB)) 
P(A U B) 
P(A) 

P{A) + P{B)' 


Example 2.2. A pair of four-sided dice is rolled and the sum is determined. 
What is the probability that a sum of 3 is rolled before a sum of 5 is rolled 
in a sequence of rolls of the dice? 
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Answer: The sample space of this random experiment is 


{(1,1) 

(1,2) 

(1,3) 

(1,4) 

O (2,1) 

(2,2) 

(2,3) 

(2,4) 

(3,1) 

(3,2) 

(3,3) 

(3,4) 

(4,1) 

(4,2) 

(4,3) 

(4,4)}. 


Let A denote the event of getting a sum of 3 and B denote the event of 


getting a sum of 5. The probability that a sum of 3 is rolled before a sum 
of 5 is rolled can be thought of as the conditional probability of a sum of 3, 
given that a sum of 3 or 5 has occurred. That is, P{A/A U B). Hence 


P{A/A U B) 


P(ln(duB)) 

P(AuB) 

P(A) 

P(A) + P(B) 

NjA)_ 

N(A) + N(B) 

2 

2 + 4 
1 

3' 


Example 2.3. If we randomly pick two television sets in succession from a 
shipment of 240 television sets of which 15 are defective, what is the proba¬ 
bility that they will be both defective? 

Answer: Let A denote the event that the first television picked was defective. 
Let B denote the event that the second television picked was defective. Then 
AnH will denote the event that both televisions picked were defective. Using 
the conditional probability, we can calculate 


P(A D B) = P(A) P(B/A) 



7 


1912' 


In Example 2.3, we assume that we are sampling without replacement. 

Definition 2.2. If an object is selected and then replaced before the next 
object is selected, this is known as sampling with replacement. Otherwise, it 
is called sampling without replacement. 
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Rolling a die is equivalent to sampling with replacement, whereas dealing 
a deck of cards to players is sampling without replacement. 

Example 2.4. A box of fuses contains 20 fuses, of which 5 are defective. If 
3 of the fuses are selected at random and removed from the box in succession 
without replacement, what is the probability that all three fuses are defective? 

Answer: Let A be the event that the first fuse selected is defective. Let B 
be the event that the second fuse selected is defective. Let C be the event 
that the third fuse selected is defective. The probability that all three fuses 
selected are defective is P(AnBflC). Hence 

P(A nBnC) = P(A) P(B/A) P(C/A n B) 



1 

114' 


Definition 2.3. Two events A and B of a sample space S are called inde¬ 
pendent if and only if 

P(A n B) = P(A) P(B). 

Example 2.5. The following diagram shows two events A and B in the 
sample space S. Are the events A and B independent? 



Answer: There are 10 black dots in S and event A contains 4 of these dots. 
So the probability of A , is P(A) = y?. Similarly, event B contains 5 black 
dots. Hence P(B) = yg. The conditional probability of A given B is 


P{A/B) = 


P(A(1B) 

P(B) 


2 

5' 
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This shows that P(A/B) = P(A). Hence A and B are independent. 

Theorem 2.1. Let A, B C S. If A and B are independent and P(B) > 0, 
then 

P{A/B) = P(A). 


Proof: 


P{A/B) = 


P(A(1B) 

P(B) 

P{A) P(B) 

P(B) 


= P{A). 


Theorem 2.2. If A and B are independent events. Then A c and B are 
independent. Similarly A and B c are independent. 

Proof: We know that A and B are independent, that is 


P(A(1B) = P{A) P(B) 


and we want to show that A c and B are independent, that is 


P(A C (1B) = P(A C )P(B). 


P(A C n B) = P(A C /B) P(B) 

= [1 - P(A/B)} P(B) 

= P{B) - P{A/B)P{B) 

= P(B) — P(A D B) 

= P{B ) - P{A) P{B) 

= P(B) [1 - P(A)} 

= P(B)P(A C ), 

the events A c and B are independent. Similarly, it can be shown that A and 
B c are independent and the proof is now complete. 

Remark 2.1. The concept of independence is fundamental. In fact, it is this 
concept that justifies the mathematical development of probability as a sepa¬ 
rate discipline from measure theory. Mark Kac said, “independence of events 
is not a purely mathematical concept.” It can, however, be made plausible 
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that it should be interpreted by the rule of multiplication of probabilities and 
this leads to the mathematical definition of independence. 

Example 2.6. Flip a coin and then independently cast a die. What is the 
probability of observing heads on the coin and a 2 or 3 on the die? 

Answer: Let A denote the event of observing a head on the coin and let B 
be the event of observing a 2 or 3 on the die. Then 


P(A(1B) = P{A) P(B) 



1 

6 ' 


Example 2.7. An urn contains 3 red, 2 white and 4 yellow balls. An 
ordered sample of size 3 is drawn from the urn. If the balls are drawn with 
replacement so that one outcome does not change the probabilities of others, 
then what is the probability of drawing a sample that has balls of each color? 
Also, find the probability of drawing a sample that has two yellow balls and 
a red ball or a red ball and two white balls? 


Answer: 


and 


P(RWY) = (1) (I) 



8 

243 


P(YYRor RWW) = (j() g) (?) 



20 

243' 


If the balls are drawn without replacement, then 


P(RWY) = ( - 


1 

21 ' 


P(YYRorRWW)=(f) (?) (?) 



7 

84' 


There is a tendency to equate the concepts “mutually exclusive” and “inde¬ 
pendence”. This is a fallacy. Two events A and B are mutually exclusive if 
A n B = 0 and they are called possible if P(A) ^ 0 ^ P(JB). 

Theorem 2.2. Two possible mutually exclusive events are always dependent 
(that is not independent). 
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Proof: Suppose not. Then 

P(A n B) = P(A) P(B) 

P(0) = P(A) P(B) 

0 = P(A)P(B). 

Hence, we get either P(A) = 0 or P(B) = 0. This is a contradiction to the 
fact that A and B are possible events. This completes the proof. 

Theorem 2.3. Two possible independent events are not mutually exclusive. 

Proof: Let A and B be two independent events and suppose A and B are 
mutually exclusive. Then 

P(A) P(B) = P(A n J5) 

= P(0) 

= 0 . 

Therefore, we get either P(A) = 0 or P(B) — 0. This is a contradiction to 
the fact that A and B are possible events. 

The possible events A and B exclusive implies A and B are not indepen¬ 
dent; and A and B independent implies A and B are not exclusive. 

2.2. Bayes’ Theorem 

There are many situations where the ultimate outcome of an experiment 
depends on what happens in various intermediate stages. This issue is re¬ 
solved by the Bayes’ Theorem. 

Definition 2.4. Let S be a set and let V = {Hi}™ 1 be a collection of subsets 
of S. The collection V is called a partition of S if 

m 

(a) S = IJ Ai 

i =1 

(b) Ai D Aj = $ for i ^ j. 
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Theorem 2.4. If the events constitute a partition of the sample 

space S and P(B t ) ^ 0 for i = 1,2, m, then for any event A in S 


P(A) = '£P(B i )P(A/B i ). 

i=l 

Proof: Let S be a sample space and A be an event in S. Let {B i }"l 1 be 
any partition of S. Then 


A=\J(AnB t ). 

i= 1 


Thus 

m 

P(A) = '£p(AnB i ) 

i—1 

m 

= '£P(B i )P(A/B i ). 

i= 1 


Theorem 2.5. If the events 1 constitute a partition of the sample 

space S and P(Bi) ± 0 for i = then for any event A in S such 

that P{A) ^ 0 


P{B k /A) = 


P{B k )P{A/B k ) 
E™i P{ B i)P{A/Bi) 


k = 1,2, m. 


Proof: Using the definition of conditional probability, we get 


P{B k /A) = 


P(AnB k ) 

P(A) 


Using Theorem 1, we get 

P(B !A\ = P ( AnB k) 

k EEi p ( B i) P( A /Bi )' 

This completes the proof. 

This Theorem is called Bayes Theorem. The probability P(B k ) is called 
prior probability. The probability P{B k /A) is called posterior probability. 

Example 2.8. Two boxes containing marbles are placed on a table. The 
boxes are labeled Bi and f? 2 . Box B\ contains 7 green marbles and 4 white 



Probability and Mathematical Statistics 


37 


marbles. Box B 2 contains 3 green marbles and 10 yellow marbles. The 
boxes are arranged so that the probability of selecting box Bi is | and the 
probability of selecting box B 2 is |. Kathy is blindfolded and asked to select 
a marble. She will win a color TV if she selects a green marble, (a) What is 
the probability that Kathy will win the TV (that is, she will select a green 
marble)? (b) If Kathy wins the color TV, what is the probability that the 
green marble was selected from the first box? 

Answer: Let A be the event of drawing a green marble. The prior proba¬ 
bilities are P{B{) = | and P{B- 2 ) = §. 

(a) The probability that Kathy will win the TV is 


P(A) = P(A n B 1) + P{A n B 2 ) 

= P(A/B 1 ) P{B{) + P(A/B 2 ) P(B 2 ) 



7 2 

33 + 13 
91 66 

429 + 429 

157 

429' 


(b) Given that Kathy won the TV, the probability that the green marble was 
selected from B 1 is 
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P{A/B 1 )P{B 1 ) 

P(A/B 0 P(Bi) + P(A/B 2 ) P(B 2 ) 



) 


91 

157' 


Note that P(A/B i) is the probability of selecting a green marble from 
B i whereas P(B\/A ) is the probability that the green marble was selected 
from box B\. 

Example 2.9. Suppose box A contains 4 red and 5 blue chips and box B 
contains 6 red and 3 blue chips. A chip is chosen at random from the box A 
and placed in box B. Finally, a chip is chosen at random from among those 
now in box B. What is the probability a blue chip was transferred from box 
A to box B given that the chip chosen from box B is red? 

Answer: Let E represent the event of moving a blue chip from box A to box 
B. We want to find the probability of a blue chip which was moved from box 
A to box B given that the chip chosen from B was red. The probability of 
choosing a red chip from box A is P(R) = | and the probability of choosing 
a blue chip from box A is P{B) = |. If a red chip was moved from box A to 
box J5, then box B has 7 red chips and 3 blue chips. Thus the probability 
of choosing a red chip from box B is ^. Similarly, if a blue chip was moved 
from box A to box B, then the probability of choosing a red chip from box 
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Hence, the probability that a blue chip was transferred from box A to box B 
given that the chip chosen from box B is red is given by 


P(E/R) = 


P(R/E) P(E) 

P(R) 



15 

29' 


Example 2.10. Sixty percent of new drivers have had driver education. 
During their first year, new drivers without driver education have probability 
0.08 of having an accident, but new drivers with driver education have only a 
0.05 probability of an accident. What is the probability a new driver has had 
driver education, given that the driver has had no accident the first year? 

Answer: Let A represent the new driver who has had driver education and 
B represent the new driver who has had an accident in his first year. Let A c 
and B c be the complement of A and B, respectively. We want to find the 
probability that a new driver has had driver education, given that the driver 
has had no accidents in the first year, that is P(A/B C ). 


P{A/B C ) 


P(AC\B C ) 

P(B C ) 

P(B C /A) P(A) 

P(B C /A) P(A) + P(B C /A C ) P(A C ) 


_ [1 -P(B/A)]P(A) _ 

[1 - P(B/A)} P(A) + [1 - P{B/A°)] [1 - P(A)} 


< 60 \ ( 95 \ 

'.100/ l 100/ 


< 40 \ ( 92 \ , ( 60 _\ ( 95 \ 
'.100/ V 100 / ' V 100 / VlOO/ 


= 0.6077. 

Example 2.11. One-half percent of the population has AIDS. There is a 
test to detect AIDS. A positive test result is supposed to mean that you 
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have AIDS but the test is not perfect. For people with AIDS, the test misses 
the diagnosis 2% of the times. And for the people without AIDS, the test 
incorrectly tells 3% of them that they have AIDS, (a) What is the probability 
that a person picked at random will test positive? (b) What is the probability 
that you have AIDS given that your test comes back positive? 


Answer: Let A denote the event of one who has AIDS and let B denote the 
event that the test comes out positive. 

(a) The probability that a person picked at random will test positive is 
given by 

P(test positive) = (0.005) (0.98) + (0.995) (0.03) 

= 0.0049 + 0.0298 = 0.035. 

(b) The probability that you have AIDS given that your test comes back 
positive is given by 


P(A/B) = 


favorable positive branches 
total positive branches 
(0.005) (0.98) 

(0.005) (0.98)+ (0.995) (0.03) 
^ = 0.14. 



Remark 2.2. This example illustrates why Bayes’ theorem is so important. 
What we would really like to know in this situation is a first-stage result: Do 
you have AIDS? But we cannot get this information without an autopsy. The 
first stage is hidden. But the second stage is not hidden. The best we can 
do is make a prediction about the first stage. This illustrates why backward 
conditional probabilities are so useful. 
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2.3. Review Exercises 

1. Let P(A) = 0.4 and P(A U B) = 0.6. For what value of P(B) are A and 
B independent? 

2 . A die is loaded in such a way that the probability of the face with j dots 
turning up is proportional to j for j = 1,2, 3,4,5,6. In 6 independent throws 
of this die, what is the probability that each face turns up exactly once? 

3. A system engineer is interested in assessing the reliability of a rocket 
composed of three stages. At take off, the engine of the first stage of the 
rocket must lift the rocket off the ground. If that engine accomplishes its 
task, the engine of the second stage must now lift the rocket into orbit. Once 
the engines in both stages 1 and 2 have performed successfully, the engine 
of the third stage is used to complete the rocket’s mission. The reliability of 
the rocket is measured by the probability of the completion of the mission. If 
the probabilities of successful performance of the engines of stages 1, 2 and 
3 are 0.99, 0.97 and 0.98, respectively, find the reliability of the rocket. 

4. Identical twins come from the same egg and hence are of the same sex. 
Fraternal twins have a 50-50 chance of being the same sex. Among twins the 
probability of a fraternal set is | and an identical set is |. If the next set of 
twins are of the same sex, what is the probability they are identical? 

5. In rolling a pair of fair dice, what is the probability that a sum of 7 is 
rolled before a sum of 8 is rolled ? 

6. A card is drawn at random from an ordinary deck of 52 cards and re¬ 
placed. This is done a total of 5 independent times. What is the conditional 
probability of drawing the ace of spades exactly 4 times, given that this ace 
is drawn at least 4 times? 

7. Let A and B be independent events with P{A) = P(B) and P{A U B) = 
0.5. What is the probability of the event A? 

8. An urn contains 6 red balls and 3 blue balls. One ball is selected at 
random and is replaced by a ball of the other color. A second ball is then 
chosen. What is the conditional probability that the first ball selected is red, 
given that the second ball was red? 
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9. A family has five children. Assuming that the probability of a girl on 
each birth was 0.5 and that the five births were independent, what is the 
probability the family has at least one girl, given that they have at least one 
boy? 

10 . An urn contains 4 balls numbered 0 through 3. One ball is selected at 
random and removed from the urn and not replaced. All balls with nonzero 
numbers less than that of the selected ball are also removed from the urn. 
Then a second ball is selected at random from those remaining in the urn. 
What is the probability that the second ball selected is numbered 3? 

11. English and American spelling are rigour and rigor, respectively. A man 
staying at A1 Rashid hotel writes this word, and a letter taken at random from 
his spelling is found to be a vowel. If 40 percent of the English-speaking men 
at the hotel are English and 60 percent are American, what is the probability 
that the writer is an Englishman? 

12 . A diagnostic test for a certain disease is said to be 90% accurate in that, 
if a person has the disease, the test will detect with probability 0.9. Also, if 
a person does not have the disease, the test will report that he or she doesn’t 
have it with probability 0.9. Only 1% of the population has the disease in 
question. If the diagnostic test reports that a person chosen at random from 
the population has the disease, what is the conditional probability that the 
person, in fact, has the disease? 

13. A small grocery store had 10 cartons of milk, 2 of which were sour. If 
you are going to buy the 6 th carton of milk sold that day at random, find 
the probability of selecting a carton of sour milk. 

14. Suppose Q and S are independent events such that the probability that 
at least one of them occurs is | and the probability that Q occurs but S does 
not occur is ^. What is the probability of S ? 

15. A box contains 2 green and 3 white balls. A ball is selected at random 
from the box. If the ball is green, a card is drawn from a deck of 52 cards. 
If the ball is white, a card is drawn from the deck consisting of just the 16 
pictures, (a) What is the probability of drawing a king? (b) What is the 
probability of a white ball was selected given that a king was drawn? 
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16. Five urns are numbered 3,4,5,6 and 7, respectively. Inside each urn is 
n 2 dollars where n is the number on the urn. The following experiment is 
performed: An urn is selected at random. If its number is a prime number the 
experimenter receives the amount in the urn and the experiment is over. If its 
number is not a prime number, a second urn is selected from the remaining 
four and the experimenter receives the total amount in the two urns selected. 
What is the probability that the experimenter ends up with exactly twenty- 
five dollars? 

17. A cookie jar has 3 red marbles and 1 white marble. A shoebox has 1 red 
marble and 1 white marble. Three marbles are chosen at random without 
replacement from the cookie jar and placed in the shoebox. Then 2 marbles 
are chosen at random and without replacement from the shoebox. What is 
the probability that both marbles chosen from the shoebox are red? 

18. A urn contains n black balls and n white balls. Three balls are chosen 
from the urn at random and without replacement. What is the value of n if 
the probability is A that all three balls are white? 

19. An urn contains 10 balls numbered 1 through 10. Five balls are drawn 
at random and without replacement. Let A be the event that “Exactly two 
odd-numbered balls are drawn and they occur on odd-numbered draws from 
the urn.” What is the probability of event A? 

20. I have five envelopes numbered 3, 4, 5, 6, 7 all hidden in a box. I 
pick an envelope - if it is prime then I get the square of that number in 
dollars. Otherwise (without replacement) I pick another envelope and then 
get the sum of squares of the two envelopes I picked (in dollars). What is 
the probability that I will get $25? 
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Chapter 3 

RANDOM VARIABLES 
AND 

DISTRIBUTION FUNCTIONS 


3.1. Introduction 

In many random experiments, the elements of sample space are not nec¬ 
essarily numbers. For example, in a coin tossing experiment the sample space 
consists of 

S = {Head, Tail}. 

Statistical methods involve primarily numerical data. Hence, one has to 
‘mathematize’ the outcomes of the sample space. This mathematization, or 
quantification, is achieved through the notion of random variables. 

Definition 3.1. Consider a random experiment whose sample space is S. A 
random variable A is a function from the sample space S into the set of real 
numbers 1R. such that for each interval I in®., the set {s£ S | X(s) £ 1} is an 
event in S. 

In a particular experiment a random variable X would be some function 
that assigns a real number A(s) to each possible outcome s in the sample 
space. Given a random experiment, there can be many random variables. 
This is due to the fact that given two (finite) sets A and R, the number 
of distinct functions one can come up with is |R|I‘ 4 L Here |A| means the 
cardinality of the set A. 

Random variable is not a variable. Also, it is not random. Thus some¬ 
one named it inappropriately. The following analogy speaks the role of the 
random variable. Random variable is like the Holy Roman Empire - it was 
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not holy, it was not Roman, and it was not an empire. A random variable is 
neither random nor variable, it is simply a function. The values it takes on 
are both random and variable. 

Definition 3.2. The set {x £ 1R | x = X(s), s € S} is called the space of the 
random variable X. 

The space of the random variable X will be denoted by Rx • The space 
of the random variable X is actually the range of the function X : S —>1. 

Example 3.1. Consider the coin tossing experiment. Construct a random 
variable X for this experiment. What is the space of this random variable 
XI 

Answer: The sample space of this experiment is given by 

S = {Head, Tail}. 

Let us define a function from S into the set of reals as follows 

X(Head) = 0 
X(Tail) = 1. 

Then A is a valid map and thus by our definition of random variable, it is a 
random variable for the coin tossing experiment. The space of this random 
variable is 

Rx = {0,1}. 



Example 3.2. Consider an experiment in which a coin is tossed ten times. 
What is the sample space of this experiment? How many elements are in this 
sample space? Define a random variable for this sample space and then find 
the space of the random variable. 
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Answer: The sample space of this experiment is given by 
S = {s | s is a sequence of 10 heads or tails}. 


The cardinality of S is 

\S\ = 2 10 . 

Let A : S —> M be a function from the sample space S into the set of reals H 
defined as follows: 


A(s) = number of heads in sequence s. 

Then A is a random variable. This random variable, for example, maps the 
sequence HHTTTHTTHH to the real number 5, that is 

X(HHTTTHTTHH) = 5. 

The space of this random variable is 

Rx = { 0 , 1 , 2 ,..., 10 }. 


Now, we introduce some notations. By (A = x) we mean the event {s £ 
S | A(s) = x}. Similarly, (a < X < b) means the event {s € S \ a < X < b} 
of the sample space S. These are illustrated in the following diagrams. 




There are three types of random variables: discrete, continuous, and 
mixed. However, in most applications we encounter either discrete or contin¬ 
uous random variable. In this book we only treat these two types of random 
variables. First, we consider the discrete case and then we examine the con¬ 
tinuous case. 

Definition 3.3. If the space of random variable A is countable, then A is 
called a discrete random variable. 





Random Variables and Distribution Functions 


48 


3.2. Distribution Functions of Discrete Random Variables 

Every random variable is characterized through its probability density 
function. 

Definition 3.4. Let Rx be the space of the random variable X. The 
function / : Rx —defined by 

/Or) = P(X = x ) 

is called the probability density function (pdf) of X. 

Example 3.3. In an introductory statistics class of 50 students, there are 11 
freshman, 19 sophomores, 14 juniors and 6 seniors. One student is selected at 
random. What is the sample space of this experiment? Construct a random 
variable X for this sample space and then find its space. Further, find the 
probability density function of this random variable X. 

Answer: The sample space of this random experiment is 

S={Fr, So, Jr, Sr}. 

Define a function X : S — >1R as follows: 

X(Fr) = 1, X{So) = 2 

X(Jr) = 3, X(Sr) = 4. 

Then clearly A is a random variable in S. The space of X is given by 

Rx = { 1 , 2 , 3 , 4 }. 

The probability density function of X is given by 

/(!) = P( x = X ) = 
m = P{X = 2 ) = ^ 

m = P(X = 3) = ^ 

/(4) = P{X = 4) = A 

Example 3.4. A box contains 5 colored balls, 2 black and 3 white. Balls 
are drawn successively without replacement. If the random variable X is the 
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number of draws until the last black ball is obtained, find the probability 
density function for the random variable X. 


Answer: Let ‘B’ denote the black ball, and ‘W’ denote the white ball. Then 
the sample space S of this experiment is given by (see the figure below) 



S = { BB, BWB , WBB, BWWB , WBWB , WWBB, 

BWWWB , WWBWB , WWWBB , WBWWB}. 


Hence the sample space has 10 points, that is IS) = 10. It is easy to see that 
the space of the random variable X is {2,3,4,5}. 



Therefore, the probability density function of X is given by 


/(3) = P(X = 3) = ^ 
m = p( X =5) = ^. 


f(2) = P{X = 2) = ± 

m = p(x = 4) = A 
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Thus 

f(x) = ~L0”"’ a; = 2, 3, 4, 5. 

Example 3.5. A pair of dice consisting of a six-sided die and a four-sided 
die is rolled and the sum is determined. Let the random variable X denote 
this sum. Find the sample space, the space of the random variable, and 
probability density function of X. 

Answer: The sample space of this random experiment is given by 




(1,2) 

(1,3) 

(1,4) 

(1,5) 

(1,6) 

q — 

(2,1) 

(2,2) 

(2,3) 

(2,4) 

(2,5) 

(2,6) 

o — 

(3,1) 

(3,2) 

(3,3) 

(3,4) 

(3,5) 

(3,6) 


(4,1) 

(4,2) 

(4,3) 

(4,4) 

(4,5) 

(4,6)} 

The space of the 

random 1 

variable X is 

given by 




Rx 

= {2, 

3, 4, 5, 

6, 7, 8 , ! 

3 , 10}. 


Therefore, the probability 

density function of X 

' is given by 

m 

= P(X = 

z 2) = 

1 

24’ 

/(3) = 

P(X = 

3 » = li 

/(4) 

= P{X = 

= 4) = 

3 

24’ 

/( 5) = 

P(X = 

5 >= 25 

m 

= P(X = 

= 6) = 

4 

24’ 

m = 

P(X = 


m 

= P(X = 

= 8) = 

3 

24’ 

m = 

P(X = 



/(10) = P(X = 10) = 

Example 3.6. A fair coin is tossed 3 times. Let the random variable X 
denote the number of heads in 3 tosses of the coin. Find the sample space, 
the space of the random variable, and the probability density function of X. 

Answer: The sample space S of this experiment consists of all binary se¬ 
quences of length 3, that is 


S = {TTT, TTH, THT, HTT, THH , HTH , HHT , HHH}. 
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The space of this random variable is given by 

R x = {0, 1 , 2, 3}. 

Therefore, the probability density function of X is given by 

/(0) = P(X = 0) = i 
/(l) = P(X = 1) = | 

m = P(X = 2) = l 

m = p(x = 3 ) = 

This can be written as follows: 



The probability density function f{x) of a random variable X completely 
characterizes it. Some basic properties of a discrete probability density func¬ 
tion are summarized below. 

Theorem 3.1. If X is a discrete random variable with space R\ and prob¬ 
ability density function /( x), then 
(a) /( x) > 0 for all x in Rx, and 

0 ) /(*) = h 

xeRx 

Example 3.7. If the probability of a random variable X with space Rx = 
{1,2,3, ...,12} is given by 


f{x) = k(2x-l), 
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then, what is the value of the constant fc? 

Answer: 

i= Y /( x ) 

x€R x 


Hence 


= Y k(2x-l) 
x£Rx 
12 

= £>(2*-l) 


= jfe 


= k 


12 


2 ^*- 12 


2 mm _ 12 


= k 144. 


fc = 


1 

144' 


Definition 3.5. The cumulative distribution function F(x) of a random 
variable X is defined as 

F(x) = P(X < x ) 


for all real numbers x. 

Theorem 3.2. If A is a random variable with the space Rx, then 

F(x) = Yf(t) 

t<.X 


for x € Rx- 

Example 3.8. If the probability density function of the random variable X 
is given by 

^j(2*-l) for *=1,2,3,..., 12 
then find the cumulative distribution function of X. 

Answer: The space of the random variable X is given by 


Rx = { 1,2, 3,..., 12}. 
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Then 

mi = £/(« =/in = ^ 

t<l 

t< 2 

F ( 3 ) = Y,fW = /d) + /( 2 ) + /( 3 ) = TiI + l|i + TiI = TiI 

t<3 

F(12) = ^ /(f) = /(l) + /(2) + • • • + /(12) = 1. 

t<12 


F(x) represents the accumulation of /(t) up to t < x. 

Theorem 3.3. Let X be a random variable with cumulative distribution 
function F(x). Then the cumulative distribution function satisfies the fol¬ 
lowings: 

(a) F(— oo) = 0, 

(b) F(oo) = 1 , and 

(c) F(x) is an increasing function, that is if x < y, then F(x ) < F(y) for 
all reals x, y. 

The proof of this theorem is trivial and we leave it to the students. 

Theorem 3.4. If the space R\ of the random variable X is given by Rx = 
{xi < x 2 < x 3 < ■ ■ ■ < x n }, then 

f(x l) = F(x i) 

f(x 2 ) = F(x 2 ) - F(x 1 ) 

f(x 3 ) = F(x 3 ) - F(x 2 ) 


f(x n ) = F(x n ) - F(x„_i). 
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Theorem 3.2 tells us how to find cumulative distribution function from the 
probability density function, whereas Theorem 3.4 tells us how to find the 
probability density function given the cumulative distribution function. 

Example 3.9. Find the probability density function of the random variable 
X whose cumulative distribution function is 


Fix ) = ^ 


0.00 

if x < — 1 

0.25 

if — 1 < x < 1 

0.50 

if 1 < x < 3 

0.75 

if 3 < x < 5 

1.00 

if x > 5 . 


Also, find (a) P(X < 3), (b) P(X = 3), and (c) P(X < 3). 
Answer: The space of this random variable is given by 

Rx = {-1, 1, 3, 5}. 


By the previous theorem, the probability density function of X is given by 

/(-l) = 0.25 
/(1) = 0.50 -0.25 = 0.25 
/(3) = 0.75 - 0.50 = 0.25 
/(5) = 1.00 - 0.75 = 0.25. 

The probability P(X < 3) can be computed by using the definition of F. 
Hence 


P(X < 3) = F(3) = 0.75. 
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The probability P(X = 3) can be computed from 

P(X = 3) = F( 3) - F(l) = 0.75 - 0.50 = 0.25. 

Finally, we get P(X < 3) from 

P(X < 3) = P(X < 1) = F{ 1) = 0.5. 

We close this section with an example showing that there is no one-to- 
one correspondence between a random variable and its distribution function. 
Consider a coin tossing experiment with the sample space consisting of a 
head and a tail, that is S = { head , tail }. Define two random variables Xi 
and X 2 from S as follows: 

Ad (head) = 0 and Xi (tail) = 1 


and 

A 2 (head) = 1 and X- 2 ( tail) = 0. 

It is easy to see that both these random variables have the same distribution 
function, namely 

(0 if x < 0 

F Xi (x) = < \ if 0 < x < 1 
11 if 1 < x 

for i = 1,2. Hence there is no one-to-one correspondence between a random 
variable and its distribution function. 

3.3. Distribution Functions of Continuous Random Variables 

A random variable X is said to be continuous if its space is either an 
interval or a union of intervals. The folllowing definition formally defines a 
continuous random variable. 

Definition 3.6. A random variable X is said to be a continuous random 
variable if there exists a continuous function / : M —> [0, oo) such that for 
every set of real numbers A 

P(X G A) = f f{x) dx. (1) 

J A 

Definition 3.7. The function / in (1) is called the probability density 
function of the continuous random variable X. 
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It can be easily shown that for every probability density function /, 



= 1 . 


Example 3.10. Is the real valued function / : 1R. — »1R defined by 

r, \ _ f 2 x~ 2 if 1 < x < 2 
X \ 0 otherwise, 


a probability density function for some random variable XI 



Answer: We have to show that / is nonnegative and the area under f(x) 
is unity. Since the domain of / is the interval (0,1), it is clear that / is 
nonnegative. Next, we calculate 


f{x) dx 


2 x 2 dx 



1 . 


Thus / is a probability density function. 

Example 3.11. Is the real valued function / :1R — >1R defined by 



if — 1 < x < 1 
otherwise, 


a probability density function for some random variable XI 
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Answer: It is easy to see that / is nonnegative, that is /( x) > 0, since 
f(x) = 1 + \x\. Next we show that the area under / is not unity. For this we 
compute 



f{x) dx 


(1 + |x|) dx 


J (1 — x) dx + j (1 + x) dx 




1 

2 


x+\x 


2 


1 

0 


3. 


Thus / is not a probability density function for some random variable X. 


Example 3.12. For what value of the constant c, the real valued function 
/ : —> E given by 


f(x) = 


c 

1 + (x — 9) 21 


— 00 < X < 00, 


where 6 is a real parameter, is a probability density function for random 
variable XI 


Answer: Since / is nonnegative, we see that c > 0. To find the value of c, 
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we use the fact that for pdf the area is unity, that is 


1 = 


/ 0) dx 


— OO 

oo 


J-oo i + {x-e ) 2 

c 


dx 


dz 


l-oo 1 + z2 
= c [tan -1 z] 

= c [tan~ 1 (oo) — tan _1 (—oo) 
'1 1 


= c 


2* + 2* 


= C7T. 


Hence c = - and the density function becomes 


fix) = 


1 


7r[l + (x~e) 2 }' 


— 00 < X < 00 . 


This density function is called the Cauchy distribution function with param¬ 
eter 6. If a random variable X has this pdf then it is called a Cauchy random 
variable and is denoted by X ~ CAU(9). 

This distribution is symmetrical about 9. Further, it achieves it maxi¬ 
mum at x = 6. The following figure illustrates the symmetry of the distribu¬ 
tion for 9 = 2. 


The Density Function with theta equal to 2 



Example 3.13. For what value of the constant c, the real valued function 
/ : 1R. —> E given by 

! c if a < x < b 
0 otherwise, 
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where a, b are real constants, is a probability density function for random 
variable X ? 



.f(x) = 


jd— if a < x < b 

b—a — — 

0 otherwise. 


This probability density function is called the uniform distribution on 
the interval [a, 6]. If a random variable X has this pdf then it is called a 
uniform random variable and is denoted by X ~ UNIF(a,b). The following 
is a graph of the probability density function of a random variable on the 
interval [2,5]. 



Definition 3.8. Let f(x) be the probability density function of a continu¬ 
ous random variable X. The cumulative distribution function F{x) of X is 
defined as x 

F{x) = P(X < x) = f /(f) dt. 

J — OO 

The cumulative distribution function F(x) represents the area under the 
probability density function /( x) on the interval (—oo, a:) (see figure below). 
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Like the discrete case, the cdf is an increasing function of x, and it takes 
value 0 at negative infinity and 1 at positive infinity. 

Theorem 3.5. If F(x) is the cumulative distribution function of a contin¬ 
uous random variable X , the probability density function f(x) of X is the 
derivative of F(x), that is 

= /(I) ’ 

Proof: By Fundamental Theorem of Calculus, we get 

Tj F{ *> ) = Tz(Lj {t)dt ) 


This theorem tells us that if the random variable is continuous, then we can 
find the pdf given cdf by taking the derivative of the cdf. Recall that for 
discrete random variable, the pdf at a point in space of the random variable 
can be obtained from the cdf by taking the difference between the cdf at the 
point and the cdf immediately below the point. 

Example 3.14. What is the cumulative distribution function of the Cauchy 
random variable with parameter 91 


Answer: The cdf of X is given by 


F{ x) = 


-s: 


f(t) dt 


1 


I -oo 7T [1 + (t - 9) 2 } 
,7-00 tt[1 + 2 2 ] 


dt 


= — tan 1 {x — 6) + \. 

7T 2 
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Example 3.15. What is the probability density function of the random 
variable whose cdf is 

F(x) = ---, —oo < x < oo? 

l + e- x 

Answer: The pdf of the random variable is given by 
/(*) = ±F(x) 

= dx (l +e-») 

= (-l)(l + e-)- 2 ^(l + e-') 
e~ x 

~ (1 + e~ x ) 2 ' 


Next, we briefly discuss the problem of finding probability when the cdf 
is given. We summarize our results in the following theorem. 

Theorem 3.6. Let A be a continuous random variable whose cdf is F(x). 
Then followings are true: 

(a) P(X <x) = F(x), 

(b) P(X >x) = l- F(x), 

(c) P(X = x) = 0 , and 

(d) P{a<X <b) = F{b)-F{a). 

3.4. Percentiles for Continuous Random Variables 

In this section, we discuss various percentiles of a continuous random 
variable. If the random variable is discrete, then to discuss percentile, we 
have to know the order statistics of samples. We shall treat the percentile of 
discrete random variable in Chapter 13. 

Definition 3.9. Let p be a real number between 0 and 1. A 100p th percentile 
of the distribution of a random variable X is any real number q satisfying 

P(X < q) <p and P(X > q) < 1 — p. 

A 100p th percentile is a measure of location for the probability distribu¬ 
tion in the sense that q divides the distribution of the probability mass into 
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two parts, one having probability mass p and other having probability mass 
1 — p (see diagram below). 



Example 3.16. If the random variable X has the density function 

f e x ~ 2 for x < 2 

.f(x) = < 

[ 0 otherwise, 

then what is the 75th percentile of XI 

Answer: Since 100p th = 75, we get p = 0.75. By definition of percentile, we 
have 

0.75 = p = f f(x) dx 

J — OO 

= f 9 e x ~ 2 dx 

J — OO 
r—21 9 


= [^- 2 F_ C 

= e«- 2 . 


From this solving for q, we get the 75th percentile to be 

q = 2 + In ' 


Example 3.17. What is the 87.5 percentile for the distribution with density 
function 

1 . . 

— OO < X < OO? 


m = -e-M 


Answer: Note that this density function is symmetric about the y-axis, that 
is f(x) = f{-x). 
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Therefore solving for q , we get 



Hence the 87.5th percentile of the distribution is In 4. 

Example 3.18. Let the continuous random variable X have the density 
function f(x) as shown in the figure below: 
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What is the 25th percentile of the distribution of X ? 

Answer: Since the line passes through the points (0,0) and (a, ^), the func¬ 
tion f(x) is equal to 

/(*) = !*. 

Since f(x) is a density function the area under f(x) should be unity. Hence 

1 = / f( x ) dx 

Jo 


Jo 


4 a 


xdx 


= — a 
8 a 


Thus a = 8. Hence the probability density function of X is 

/w= V 

Now we want to find the 25th percentile. 


25 

100 


Jo 


f(x) dx 

q ^xdx 

32 


JO 
1 2 
64 9 ' 


Hence q = \/l6, that is the 25th percentile of the above distribution is 4. 

Definition 3.10. The 25th and 75th percentiles of any distribution are 
called the first and the third quartiles, respectively. 
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Definition 3.11. The 50th percentile of any distribution is called the median 
of the distribution. 

The median divides the distribution of the probability mass into two 
equal parts (see the following figure). 



If a probability density function f(x) is symmetric about the y- axis, then the 
median is always 0. 

Example 3.19. A random variable is called standard normal if its proba¬ 
bility density function is of the form 


f(x) = 



— 00 < X < 00. 


What is the median of X ? 

Answer: Notice that f(x) = hence the probability density function 

is symmetric about the y-axis. Thus the median of A' is 0. 

Definition 3.12. A mode of the distribution of a continuous random variable 
X is the value of x where the probability density function f(x) attains a 
relative maximum (see diagram). 
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A mode of a random variable X is one of its most probable values. A 
random variable can have more than one mode. 

Example 3.20. Let X be a uniform random variable on the interval [0,1], 
that is X ~ UNIF(0 ,1). How many modes does X have? 

Answer: Since X ~ UNIF( 0,1), the probability density function of X is 

( 1 if 0 < x < 1 

/ 0 ) = l 

l0 otherwise. 

Hence the derivative of f(x) is 

/'(*) = o x e (o,i). 


Therefore X has infinitely many modes. 

Example 3.21. Let A be a Cauchy random variable with parameter 0 = 0, 
that is X ~ CAU( 0). What is the mode of XI 

Answer: Since X ~ CAU( 0), the probability density function of f(x) is 


Hence 


/'(*) = 


—2x 


7T (1 + X 2 ) 2 

Setting this derivative to 0, we get x = 0. Thus the mode of A is 0. 

Example 3.22. Let A be a continuous random variable with density func¬ 
tion 

x 2 e~ bx for x > 0 


/O) = 


0 


otherwise, 


where b > 0. What is the mode of A? 

Answer: 


0 = 


dx 


= 2xe~ bx - x 2 be~ bx 


= (2 — bx)x = 0. 


Hence 



x = 0 


or 
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Thus the mode of X is |. The graph of the f(x) for b = 4 is shown below. 


I 

The mode of f(x) with b=4 

1 

0.03 

/' 

\ 

0.02 


\ 

0.01 




£ . . . . A 

^tlode 

"1 

1 .... 1 

0. 

5 1 1.5 2 2.5 3 


Example 3.23. A continuous random variable has density function 

/ 0 ) = 


for 0 < x < 9 


0 otherwise, 

for some 9 > 0. What is the ratio of the mode to the median for this 
distribution? 

Answer: For fixed 9 > 0, the density function f{x) is an increasing function. 
Thus, f(x) has maximum at the right end point of the interval [0, 6\. Hence 
the mode of this distribution is 9. 

Next we compute the median of this distribution. 


/ 0) dx 
q 3x 2 


9 3 

319 


dx 


J 0 


x 

¥ 

^31 

¥ 


Hence 

<7 = 2-5 19 . 

Thus the ratio of the mode of this distribution to the median is 

mode 


median 2 _ s 


= s/2. 
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Example 3.24. A continuous random variable has density function 


f(x) = 


^ for 0 < x < 9 


0 otherwise, 


for some 9 > 0. What is the probability of X less than the ratio of the mode 
to the median of this distribution? 

Answer: In the previous example, we have shown that the ratio of the mode 
to the median of this distribution is given by 

mode 3/ - 

a := --— = v 2. 

median 

Hence the probability of X less than the ratio of the mode to the median of 
this distribution is 


P(X<a)= f f(x) dx 
Jo 


9 3 

(^ 2 ) 3 = 2 

9 3 9 3 ' 


3.5. Review Exercises 


1. Let the random variable X have the density function 


/(d = 


k x for 0 < x < J | 


0 elsewhere. 


If the mode of this distribution is at x = V, then what is the median of XI 
2. The random variable X has density function 


f(x) = 


cx k+1 (1 — x) k for 0 < x < 1 
0 otherwise, 


where c > 0 and 1 < k < 2. What is the mode of XI 
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3. The random variable X has density function 


/ 0 ) = 


(k + 1) x 2 for 0 < x < 1 
0 otherwise, 


where k is a constant. What is the median of XI 

4. What are the median, and mode, respectively, for the density function 


/ 0 ) = 


7T (1 + X 2 ) ’ 


— 00 < X < 00? 


5. What is the 10 th percentile of the random variable X whose probability 
density function is 


/ 0 ) = 


d if x > 0, 8 > 0 


elsewhere? 


6. What is the median of the random variable X whose probability density 
function is 


/ 0 ) = 


^ e 2 if x > 0 


elsewhere? 


7. A continuous random variable X has the density 


f{x) = 


for 0 < x < 2 


0 otherwise. 


What is the probability that X is greater than its 75 th percentile? 

8. What is the probability density function of the random variable X if its 
cumulative distribution function is given by 


F{x) = 


0.0 if x < 2 
0.5 if 2 < x < 3 
0.7 if 3 < x < 7r 
1.0 if X > 7T? 


9. Let the distribution of X for x > 0 be 


F(x) = 1-^2 


x k e x 
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F(x) = 


What is the density function of X for x > 0? 

10 . Let X be a random variable with cumulative distribution function 

1 — e~ x for x > 0 

^0 for x < 0. 

What is the P (0 < e x < 4)? 

11. Let X be a continuous random variable with density function 

ax 2 e~ 10x for x > 0 
0 otherwise, 


f(x) = 


where a > 0. What is the probability of X greater than or equal to the mode 
of X? 

12. Let the random variable X have the density function 


f(x) 


kx 


for 0 < x < 



\ 0 elsewhere. 

If the mode of this distribution is at x = V, then what is the probability of 
X less than the median of XI 

13. The random variable X has density function 

( (k + 1) x 2 for 0 < x < 1 

f(x) = l 


otherwise, 


where k is a constant. What is the probability of X between the first and 
third quartiles? 

14. Let X be a random variable having continuous cumulative distribu¬ 
tion function F(x). What is the cumulative distribution function Y = 
max(0, —X)? 

15. Let X be a random variable with probability density function 

/( x ) = for x= 1,2,3,.... 

What is the probability that X is even? 
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16. An urn contains 5 balls numbered 1 through 5. Two balls are selected 
at random without replacement from the urn. If the random variable A 
denotes the sum of the numbers on the 2 balls, then what are the space and 
the probability density function of X ? 

17. A pair of six-sided dice is rolled and the sum is determined. If the 
random variable X denotes the sum of the numbers rolled, then what are the 
space and the probability density function of XI 

18. Five digit codes are selected at random from the set {0,1,2,..., 9} with 
replacement. If the random variable X denotes the number of zeros in ran¬ 
domly chosen codes, then what are the space and the probability density 
function of A? 

19. A urn contains 10 coins of which 4 are counterfeit. Coins are removed 
from the urn, one at a time, until all counterfeit coins are found. If the 
random variable X denotes the number of coins removed to find the first 
counterfeit one, then what are the space and the probability density function 
of X? 

20. Let A be a random variable with probability density function 

/O) = ^ for x = 1, 2,3,4, ...,oo 

for some constant c. What is the value of c? What is the probability that A 
is even? 

21. If the random variable X possesses the density function 


f cx if 0 < x < 2 

\ 0 otherwise, 


then what is the value of c for which f(x) is a probability density function? 
What is the cumulative distribution function of A. Graph the functions f(x) 
and F{x). Use F{x) to compute P(1 < A < 2). 

22. The length of time required by students to complete a 1-hour exam is a 
random variable with a pdf given by 

,, . _ f cx 2 + x if 0 < a: < 1 
X 0 otherwise, 


then what the probability a student finishes in less than a half hour? 
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23. What is the probability of, when blindfolded, hitting a circle inscribed 
on a square wall? 

24. Let /( x) be a continuous probability density function. Show that, for 

every — oo < fi < oo and a > 0, the function -f is also a probability 

density function. 

25. Let X be a random variable with probability density function f(x) and 
cumulative distribution function F(x). True or False? 

(a) f(x) can’t be larger than 1. (b) F(x) can’t be larger than 1. (c) f(x) 
can’t decrease, (d) F(x) can’t decrease, (e) f(x) can’t be negative, (f) F(x) 
can’t be negative, (g) Area under / must be 1. (h) Area under F must be 
1. (i) / can’t jump, (j) F can’t jump. 
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Chapter 4 

MOMENTS OF RANDOM 
VARIABLES 
AND 

CHEBYCHEV INEQUALITY 


4.1. Moments of Random Variables 

In this chapter, we introduce the concepts of various moments of a ran¬ 
dom variable. Further, we examine the expected value and the variance of 
random variables in detail. We shall conclude this chapter with a discussion 
of Chebychev’s inequality. 

Definition 4.1. The n th moment about the origin of a random variable X, 
as denoted by E{X n ), is defined to be 

f ^2 xH fi x ) if X is discrete 

E {X n ) = l xGRx 

1 J-^oc x,n f( x ) d x if X is continuous 
for n = 0,1, 2, 3,...., provided the right side converges absolutely. 

If n = 1, then E(X) is called the first moment about the origin. If 
n = 2, then E(X 2 ) is called the second moment of X about the origin. In 
general, these moments may or may not exist for a given random variable. 
If for a random variable, a particular moment does not exist, then we say 
that the random variable does not have that moment. For these moments to 
exist one requires absolute convergence of the sum or the integral. Next, we 
shall define two important characteristics of a random variable, namely the 
expected value and variance. Occasionally E ( X n ) will be written as E [X n ], 
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4.2. Expected Value of Random Variables 

A random variable X is characterized by its probability density function, 
which defines the relative likelihood of assuming one value over the others. 
In Chapter 3, we have seen that given a probability density function / of 
a random variable X, one can construct the distribution function F of it 
through summation or integration. Conversely, the density function f(x) 
can be obtained as the marginal value or derivative of F(x). The density 
function can be used to infer a number of characteristics of the underlying 
random variable. The two most important attributes are measures of location 
and dispersion. In this section, we treat the measure of location and treat 
the other measure in the next section. 

Definition 4.2. Let X be a, random variable with space Rx and probability 
density function f(x). The mean /xx of the random variable X is defined as 

! y; x f(x) if X is discrete 

xeRx 

f^oo x /( x ) d' x if X is continuous 
if the right hand side exists. 

The mean of a random variable is a composite of its values weighted by the 
corresponding probabilities. The mean is a measure of central tendency: the 
value that the random variable takes “on average.” The mean is also called 
the expected value of the random variable X and is denoted by E(X). The 
symbol E is called the expectation operator. The expected value of a random 
variable may or may not exist. 

Example 4.1. If A is a uniform random variable on the interval (2, 7), then 
what is the mean of X? 
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Answer: The density function of X is 

if 2 < x < 7 

fix) = 

^ 0 otherwise. 

Thus the mean or the expected value of X is 

Mx = E(X) 

/ OO 

xf(x) dx 



dx 



= — (49 - 4) 
10 v ; 

_ 45 
“ 10 

_ 9 
“ 2 


_ 2 + 7 
_ 2 

In general, if X ~ UNIF{a , 6), then E(X) = \(a + b). 

Example 4.2. If A is a Cauchy random variable with parameter 9 , that is 
X ~ CAU{9), then what is the expected value of XI 

Answer: We want to find the expected value of X if it exists. The expected 
value of X will exist if the integral f R xf(x)dx converges absolutely, that is 

/ | a; f(x) | dx < oo. 

Jr 

If this integral diverges, then the expected value of X does not exist. Hence, 
let us find out if f R \x f(x) \ dx converges or not. 
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\xf(x)\dx 


/ OO 

\xf(x)\ 

-OO 


dx 


/: 


7r[l + (x - 0) 2 } 


-i: 


(z + e) 


7r[l + z 2 \ 


dx 


dz 


„ 1 
9 + 2 z —~^ dz 

Jo 


7T 


[1 + , 2 ] 


= 9 


- ln(l + z 2 ) 
7r 


J o 


9 H— lim ln(l + b 2 ) 

7r 6—»-oo 
0 + OO 

00 . 


Since, the above integral does not exist, the expected value for the Cauchy 
distribution also does not exist. 

Remark 4.1. Indeed, it can be shown that a random variable X with the 
Cauchy distribution, E ( X n ), does not exist for any natural number n. Thus, 
Cauchy random variables have no moments at all. 

Example 4.3. If the probability density function of the random variable X 
is 

f (I-PT^P if X = 1,2,3,4, ...,oo 
/ 0*0 = < 

y 0 otherwise, 


then what is the expected value of XI 
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Answer: The expected value of X is 

E{X)= £ xf(x) 
xERx 


oo 

= 5 >( f -vy-'p 

X=1 



1 

p 


Hence the expected value of X is the reciprocal of the parameter p. 

Definition 4.3. If a random variable X whose probability density function 
is given by 

({l-p) x ~ 1 p if x = 1,2,3,4,...,oo 

/ 0 ) = < 

0 otherwise 

is called a geometric random variable and is denoted by X ~ GEO{p). 

Example 4.4. A couple decides to have 3 children. If none of the 3 is a 
girl, they will try again; and if they still don’t get a girl, they will try once 
more. If the random variable X denotes the number of children the couple 
will have following this scheme, then what is the expected value of X? 

Answer: Since the couple can have 3 or 4 or 5 children, the space of the 
random variable X is 


Rx = {3, 4, 5}. 
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The probability density function of X is given by 
/(3) = P(X = 3) 

= P(at least one girl) 

= 1 — P(no girls) 

= 1 — P(3 boys in 3 tries) 

= 1 — (P(l boy in each try)) 3 



7 

8 ' 


/(4) = P(X = 4) 

= P(3 boys and 1 girl in last try) 

= (P(l boy in each try)) 3 P(1 girl in last try) 



1 

16' 


/(5) = P(X = 5) 

= P(4 boys and 1 girl in last try) + P(5 boys in 5 tries) 

= P(1 boy in each try) 4 P(1 girl in last try) + P(1 boy in each try) 5 



1 

16' 


Hence, the expected value of the random variable is 


E(X)= J2 */(*) 

X^lRx 


5 


= Y J x f(x) 

x—3 


= 3/(3)+ 4/(4)+5/(5) 

= 3 —+4—+5 — 

16 16 16 

42 + 4 + 5 


16 

_ 51 _ 3 

16 16' 
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Remark 4.2. We interpret this physically as meaning that if many couples 

have children according to this scheme, it is likely that the average family 

size would be near 3 A children. 

16 

Example 4.5. A lot of 8 TV sets includes 3 that are defective. If 4 of the 
sets are chosen at random for shipment to a hotel, how many defective sets 
can they expect? 

Answer: Let X be the random variable representing the number of defective 
TV sets in a shipment of 4. Then the space of the random variable X is 

R x = {0, 1, 2, 3}. 


Then the probability density function of X is given by 
/( x) = P(X = x) 

= P(x defective TV sets in a shipment of four) 


© ( 4 ©) 


Hence, we have 


m = 


/(i) = 


x = 0,1,2,3. 

(?) CD _ s 


(4) 70 

(1) (3) 30 

(4) 70 

3\ /5\ 


f(2) (2) (2) _ 30 

) ( 8) 70 

/(3) - (IH1) -1 

/( ) ( s } 70 - 

Therefore, the expected value of X is given by 
E{X)= Y, */(*) 

x€Rx 

= Y x 
0 

= /(!)+ 2/(2) + 3/(3) 

30 30 n 5 

=-1- 2-1-3 — 

70 70 70 

30 + 60+ 15 


105 

To" 


70 

= 1.5. 
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Remark 4.3. Since they cannot possibly get 1.5 defective TV sets, it should 
be noted that the term “expect” is not used in its colloquial sense. Indeed, it 
should be interpreted as an average pertaining to repeated shipments made 
under given conditions. 

Now we prove a result concerning the expected value operator E. 

Theorem 4.1. Let X be a random variable with pdf f(x). If a and b are 
any two real numbers, then 

E(aX + b) = aE(X) + b. 


Proof: We will prove only for the continuous case. 


-L 


E(aX + b)= / (a x + b) f (x) dx 


—oo 
oo 


/ oo poo 

axf(x)dx + / bf(x) 
-oo J — oo 


dx 


= a / x f(x) dx + b 
J — OO 

= aE(X) + b. 


To prove the discrete case, replace the integral by summation. This completes 
the proof. 

4.3. Variance of Random Variables 

The spread of the distribution of a random variable X is its variance. 

Definition 4.4. Let X be a random variable with mean fix- The variance 
of X, denoted by Var(X ), is defined as 


Var(X)=E({X-fi x } 2 ). 


It is also denoted by a\. The positive square root of the variance is 
called the standard deviation of the random variable X. Like variance, the 
standard deviation also measures the spread. The following theorem tells us 
how to compute the variance in an alternative way. 

Theorem 4.2. If X is a random variable with mean fix and variance 
then 


a 2 x = E(X 2 )-(fi x ) 2 . 
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Proof: 


a 2 x = E ([X - fi x } 2 ) 

= E(X 2 -2n x X + n\) 

= E(X 2 )-2fi x E(X) + (» x ) 2 
= E(X 2 ) — 2 fj, x fi x + ( fi x ) 2 
= E(X 2 )-(fi x ) 2 . 


Theorem 4.3. If X is a random variable with mean fi x and variance a\, 
then 

Var(aX + b) = a 2 Var(X), 
where a and b are arbitrary real constants. 

Proof: 

Var(aX+ b) = E ([(aX+ b) - fi aX+ b f) 

= E (\aX + b - E(aX + b)] 2 ) 

= E([aX + b~afjL X+ ~b} 2 ^ 

= E(a 2 [X- Mx ] 2 ) 

= a 2 E ([X - fi x } 2 ^ 

= a 2 Var(X). 

Example 4.6. Let X have the density function 

( fe for 0 < x < k 

fix) = | 

^ 0 otherwise. 

For what value of k is the variance of X equal to 2? 

Answer: The expected value of X is 

E(X)= [ x f(x) dx 
Jo 
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fk 

E(X 2 ) = / x 2 f(x) dx 

Jo 


= x 2 —TT dx 


= 2 k 2 
4 k ' 


Hence, the variance is given by 


Var(X) = E(X 2 )~(^ x ) 2 

= l k 2 - | k 2 

4 9 

- 1 k 2 
~18 k ' 

Since this variance is given to be 2, we get 

— k 2 = 2 
18 

and this implies that k = ±6. But k is given to be greater than 0, hence k 
must be equal to 6. 

Example 4.7. If the probability density function of the random variable is 


/O) = 


then what is the variance of X? 


1 — \x\ for lari < 1 


0 otherwise, 


Answer: Since Var(X) = E(X 2 ) — /j' 2 x , we need to find the first and second 
moments of X. The first moment of X is given by 


fix = E(X) 


/ oo 

x f(x) dx 

-oo 

= J x (1 — |ar|) dx 

fO rl 

= x (1 + x) dx + / x (1 — x) dx 

= J (x + x 2 ) dx + J (x — x 2 ) dx 


_ 1 1 1 1 
_ 3~2 + 2~3 
= 0. 
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The second moment E(X 2 ) of X is given by 

/ OO 

x 2 f(x) dx 

-OO 

= J x 2 (1 — |x|) dx 
= j x 2 (1 + x) dx + J ; 


i-i 

r 0 


= / (x 2 + x 3 ) dx+ (x 


1111 

3 4 + 34 

1 

6 ' 


(1 — x) dx 
— x 3 ) dx 


Thus, the variance of X is given by 

Var{X)=E{X 2 )-n\= l --0= 1 -. 


The Graph of the Function f(x) = l-|x| 



Example 4.8. Suppose the random variable X has mean /i and variance 
a 2 > 0. What are the values of the numbers a and b such that a + bX has 
mean 0 and variance 1? 

Answer: The mean of the random variable is 0. Hence 

0 = E{a + bX) 

= a + bE(X) 

= a + b /i. 

Thus a = —bfi. Similarly, the variance of a + bX is 1. That is 

1 = Var(a + bX) 

= b 2 Var(X) 

= b 2 a 2 . 
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and a =- 

a 

and a = —. 

a 

Example 4.9. Suppose X has the density function 

3 a: 2 for 0 < x < 1 
0 otherwise. 

What is the expected area of a random isosceles right triangle with hy¬ 
potenuse XI 


The Graph of the Density Function 





a 


Answer: Let ABC denote this random isosceles right triangle. Let AC = x. 
Then 

AB = BC=-^= 

V2 

A f A T“A /—A 1 X X X 2 

Area of ABC = - —= = — 

2V5V5 4 

The expected area of this random triangle is given by 

f 1 x 2 3 

E(area of random ABC) = / -— 3 x 2 dx = —. 

^ Jo 4 20 
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For the next example, we need these following results. For — 1 < x < 1, let 


g ( x ) = J2 axk 

k =0 


a 

1 — x 


Then 


and 


g\x) = 'Y^akx k 1 

fc =i 


a 

(1 — x) 2 ’ 


g"{x) = a k (k — 1) a; fc 2 
k =2 


2a 

(1-x) 3 ' 


Example 4.10. If the probability density function of the random variable 
X is 


fix) 


(1 — p) x 1 p if x = 1, 2,3,4,..., oo 
0 otherwise, 


then what is the variance of XI 


Answer: We want to find the variance of X. But variance of X is defined 
as 

Var(X) = E(X 2 ) - [E(X)} 2 

= E(X(X - 1)) + E(X) - [E(X) ] 2 . 

We write the variance in the above manner because E(X 2 ) has no closed form 
solution. However, one can find the closed form solution of E(X(X — 1)). 
From Example 4.3, we know that E(X) = K Hence, we now focus on finding 
the second factorial moment of X , that is E(X(X — 1)). 


OO 

E(X(X -1)) = J2 x (x - 1) (1 - Pf- 1 P 

X — l 
OO 

= ^ x (x - l) (l - p) (l - p) x ~ 2 p 

x—2 

2p(l-p) = 2(1 -p) 

(l-(l-p )) 3 p 2 

Hence 


Var(X) = E(X(X - 1)) + E(X) - [ E(X) ] 2 = + 1-1 = 1 * 

71 ^ 71 71 ^ 71 ^ 



Probability and Mathematical Statistics 


87 


4.4. Chebychev Inequality 

We have taken it for granted, in section 4.2, that the standard deviation 
(which is the positive square root of the variance) measures the spread of 
a distribution of a random variable. The spread is measured by the area 
between “two values”. The area under the pdf between two values is the 
probability of X between the two values. If the standard deviation a measures 
the spread, then cr should control the area between the “two values”. 


It is well known that if the probability density function is standard nor¬ 
mal, that is 


f(x) = 



— 00 < X < 00, 


then the mean /z = 0 and the standard deviation cr = 1, and the area between 
the values /z — a and /z + a is 68%. 


The Non 

nal Dt 

0.,4- 

/ 

/ 0.3 

0.2 

AREA 

0. 1 

msity 

68% 

Function 

-3 -2 -1 

tfean-lSD 

-0.1 

12 3 

tfea.n+1SD 


The Normal Density Function 


H@&n-2SD 


Similarly, the area between the values /j — 2cr and /z + 2a is 95%. In this 
way, the standard deviation controls the area between the values /./ — kcr and 
H + kcr for some k if the distribution is standard normal. If we do not know 
the probability density function of a random variable, can we find an estimate 
of the area between the values /z — kcr and /z + kcr for some given k? This 
problem was solved by Chebychev, a well known Russian mathematician. He 
proved that the area under f(x) on the interval [/z — kcr , /./ + ka\ is at least 
1 — k~ 2 . This is equivalent to saying the probability that a random variable 
is within k standard deviations of the mean is at least 1 — k~ 2 . 

Theorem 4.4 (Chebychev Inequality). Let X be a random variable with 
probability density function f(x). If /z and cr > 0 are the mean and standard 
deviation of X, then 


P(\X-fx\<ka)>l-^ 


for any nonzero real positive constant k. 
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Proof: We assume that the random variable X is continuous. If X is not 
continuous we replace the integral by summation in the following proof. From 
the definition of variance, we have the following: 

/ OO 

(x — fi ) 2 f(x) dx 

-OO 


/ fi—k a a 

(x- p) 2 f(x)dx+ / ( 

-oo J ii—k a 


fi-\-k <j 


/r) 2 f(x) dx 


+ 


' fi-\-k a 


(x 


H) 2 f(x) dx. 


Since, J^^(x — /z) 2 f(x) dx is positive, we get from the above 


/ fi—k a noo 

(x — n) 2 f(x) dx+ (x — fi) 2 /( x) dx. 

-OO J LL-\-k <7 


fi-\-k a 


If x € (—oo, fi — ka), then 
Hence 


x< n~ka. 


ka < fi — x 


for 

k 2 a 2 < (fi — x) 2 . 

That is (fi — x) 2 > k 2 a 2 . Similarly, if a: € (fi + ka, oo), then 


(4.1) 


x > fi + k a 
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Therefore 

k 2 a 2 < (fi — x) 2 . 

Thus if x ^ (fi — k cr, /i + k a), then 

(/r — x) 2 > k 2 a 2 . 


Using (4.2) and (4.1), we get 


<r~ > k 2 a 


a 


f{x) dx - 


/ n+k g 


f{x) dx 


Hence 

Therefore 

Thus 


k 2 


> 


p/l-kG /»oo 

f(x) dx+ f(x) dx 

— OO j fl-\-k G 


k 2 


> P(X < /i — k a) + P(X > n + ka). 


->P(\X-n\>ka) 


which is 

P(|X- M |<fca)>l-p. 
This completes the proof of this theorem. 


(4.2) 


The following integration formula 


[ x n {l-x) m dx = 

Jo 


n! to ! 


(n + to + 1 )! 


will be used in the next example. In this formula m and n represent any two 
positive integers. 


Example 4.11. 

X be 


Let the probability density function of a random variable 


( 630 x 4 (1 — x) 4 if 0 < x < 1 

f(x) = 

{0 otherwise. 


What is the exact value of P(| X — /r| < 2 ct)? What is the approximate value 
of P(\X — /.i| < 2 a) when one uses the Chebychev inequality? 
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Answer: First, we find the mean and variance of the above distribution. 
The mean of X is given by 


E(X) = [ x f(x) dx 
Jo 

= f 630 x 5 (1 — a;) 4 dx 
Jo 


= 630 


5! 4! 

(5 + 4 + 1)! 


= 630 


= 630 


5! 4! 

ToT 

2880 

3628800 


630 

1260 

1 

2 ' 


Similarly, the variance of X can be computed from 


Var(X) = f x 2 /( x) dx — /j, x 
Jo 

= I 630 x 6 (1 — x) 4 dx 
Jo 


/ 0 
= 630 

= 630 


6! 4! 


(6 + 4 + 1)! 
6! 4! 1 

TlT _ 4 


22 4 

_ 12 11 

“ 44 ~~ 44 
1 

“ 44 


1 

4 


Therefore, the standard deviation of X is 
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Thus 


P(\X - n\ < 2 a) = P (\X - 0.51 < 0.3) 


= P(—0.3 < X — 0.5 < 0.3) 


= P(0.2 < X < 0.8) 


r 0.8 
/ 0.2 


630 x 4 (1 — x) 4 dx 


= 0.96. 

If we use the Chebychev inequality, then we get an approximation of the 
exact value we have. This approximate value is 


P(|X — /r| < 2 a) > 1 - 1 = 0.75 


Hence, Chebychev inequality tells us that if we do not know the distribution 
of X, then P (\X — n\ < 2 a) is at least 0.75. 



Lower the standard deviation, and the smaller is the spread of the distri¬ 
bution. If the standard deviation is zero, then the distribution has no spread. 
This means that the distribution is concentrated at a single point. In the 
literature, such distributions are called degenerate distributions. The above 
figure shows how the spread decreases with the decrease of the standard 
deviation. 

4.5. Moment Generating Functions 

We have seen in Section 3 that there are some distributions, such as 
geometric, whose moments are difficult to compute from the definition. A 
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moment generating function is a real valued function from which one can 
generate all the moments of a given random variable. In many cases, it 
is easier to compute various moments of X using the moment generating 
function. 

Definition 4.5. Let X be a random variable whose probability density 
function is f(x). A real valued function M : 1R. — s- 1R defined by 

M(t) =E(e tx ) 

is called the moment generating function of X if this expected value exists 
for all t in the interval —h < t < h for some h > 0. 

In general, not every random variable has a moment generating function. 
But if the moment generating function of a random variable exists, then it 
is unique. At the end of this section, we will give an example of a random 
variable which does not have a moment generating function. 

Using the definition of expected value of a random variable, we obtain 
the explicit representation for M(t) as 

f e tx f(x) if X is discrete 

M(t) = < xeRx 

e tx f(x) dx if X is continuous. 

Example 4.12. Let X be a random variable whose moment generating 
function is M(t) and n be any natural number. What is the n th derivative 
of M(t) at t = 0? 

Answer: 


^ M <‘> = 


-j-E (e tx ) 
dt v ’ 
' a 

e{ t/ x 

dt 


E(Xe 


tx\ 


E 'p x 


E(X 


2 e tx ) 


Similarly, 
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Hence, in general we get 


d n 

dt* 


M(t ) = 


r\n 

E(e tx ) 
dt n V ’ 


E 


„tx 


dt n 


E( X 


n e tx ) 


If we set t = 0 in the n th derivative, we get 


cP 


M(t) 


t=0 


P(A" e* A ") | t=0 = E {X n ). 


Hence the n th derivative of the moment generating function of X evaluated 
at t = 0 is the ?r th moment of X about the origin. 


This example tells us if we know the moment generating function of 
a random variable; then we can generate all the moments of X by taking 
derivatives of the moment generating function and then evaluating them at 
zero. 

Example 4.13. What is the moment generating function of the random 
variable X whose probability density function is given by 


f(x) 


e x for x > 0 
0 otherwise? 


What are the mean and variance of X? 

Answer: The moment generating function of X is 

M[t) = E (e tX ) 

e tx f(x) dx 

e tx e~ x dx 



e -(i-t)x dx 


i o 


1 


1 -t L 
1 




J o 


1 -t 


if 1 - t > 0. 
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The expected value of X can be computed from M(t) as 


Similarly 


£(*) = -M(t) 


= ( 1 - i )- 2 

1 


t =0 
-1 


0 


t—0 


(1-t) 2 
= 1. 


E(X 2 ) = ^M(t) 




d 2 

= 5e< 1 ^ ,) 


t=0 

-1 


= 2(l-t) 
2 


-3 


t =0 
t=0 


t=0 


= 2 . 

Therefore, the variance of X is 

Var(X) = £(X 2 ) - (/r) 2 = 2-1 = 1. 



Example 4.14. Let X have the probability density function 

§ (§r for a; = 0,1,2,..., oo 

0 otherwise. 


/ 0 ) = 


What is the moment generating function of the random variable X? 
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Answer: 


M(t) =E(e tx ) 

OO 

= E etx /w 


9/ V 9 


7 x=0 v 

97 l-e*| 


if e 4 - < 1 


9-8 e} 


if t < In I - ) . 


Example 4.15. Let X be a continuous random variable with density func¬ 
tion 

p/ \ f be~ bx for x > 0 


/ 0 ) = 


0 otherwise 


where b > 0. If M{t) is the moment generating function of X , then what is 
M(—6 b)l 


Answer: 


M(t) =E(e tx ) 


be tx e~ bx dx 


= b / e-( b - t)x dx 


if b — t > 0. 


Hence M(-6b) = £ = 


Example 4.16. Let the random variable X have moment generating func¬ 
tion M(f) = (1 — t)~ 2 for t < 1. What is the third moment of X about the 
origin? 


Answer: To compute the third moment E(X 3 ) of X about the origin, we 
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need to compute the third derivative of M ( t) at t = 0 . 

M(t) = (l-t)- 2 
M'(t) = 2 (1 - t)~ 3 
M"(t) = 6 (i - ty 4 
M'”(t) = 24 (1 - t)~ 5 . 

Thus the third moment of X is given by 

E (X 3 ) = , 24 = 24. 

y 1 (1 - 0) 5 

Theorem 4.5. Let M(t) be the moment generating function of the random 
variable X. If 


M(t) — ciq + di t + fl 2 f 2 + • • • + ci n t n + • • • (4.3) 


is the Taylor series expansion of M(t), then 


E(X n ) = (n!) a n 


for all natural number n. 

Proof: Let M(t) be the moment generating function of the random variable 
X. The Taylor series expansion of M ( t) about 0 is given by 


0) M"( 0) 2 M"'( 0) o M(")(0) „ 

M(t) = M (0) H- t H-^ 77 —- t 2 H- —- t 3 H- 1 - y~~ t n + 


1 ! 


2 ! 


3! 


n! 


Since E(X n ) = M^ n \ 0) for n > 1 and M(0) = 1, we have 


,, E(X) E(X 2 ) 2 E(X 3 ) , „ 

m( t) = n— y 1 1 h— t 2 h— 1 3 h- 1 — y—t t n + • 


1 ! 


2 ! 


3! 


n! 


(4.4) 


From (4.3) and (4.4), equating the coefficients of the like powers of t, we 
obtain 

E ( X n ) 


— 


n! 


which is 


E(X n ) = (n\) a n . 


This proves the theorem. 
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Example 4.17. What is the 479 th moment of X about the origin, if the 
moment generating function of X is y4y? 

Answer The Taylor series expansion of M(t) = y^y can be obtained by using 
long division (a technique we have learned in high school). 


M(t) = 


1 


1 ~\~ t 
1 


i-H) 

- i + (-t) + (- t ) 2 + (-t ) 3 + • • • + (- t) n d 
= i -1 +1 2 -1 3 +1 4 + ■ ■ ■ + (-i ) n t n + • • • 


Therefore a n = (—1)" and from this we obtain 0.479 = — 1. By Theorem 4.5, 


E (X 479 ) = (479!) 0479 = - 479! 


Example 4.18. If the moment generating of a random variable X is 


M(t ) = 


j=o 


J 




then what is the probability of the event X = 2? 

Answer: By examining the given moment generating function of X , it 
is easy to note that X is a discrete random variable with space R\ = 
{0,1, 2, • • •, 00 }. Hence by definition, the moment generating function of 
X is 

OO 

M(t) = Y,e tj m- ( 4 - 5 ) 

3= 0 


But we are given that 


“ e W-!) 

"M = £ — 


3=0 

OO _ 1 




3 = 0 


From (4.5) and the above, equating the coefficients of e tJ , we get 
e " 1 

m = , 


J- 


for j = 0 , 1 , 2 ,..., 00 . 
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Thus, the probability of the event X = 2 is given by 

n x = 2) = m = ^ = l. 

Example 4.19. Let X be a random variable with 

£(X n ) = 0.8 for n= 1,2,3,...,oo. 

What are the moment generating function and probability density function 
of X? 

Answer: 


M(t) = M(0) + J2 f ^ 


n— 1 


= M{Q) + Y J E{X n ) f- 


n— 1 


= 1 + 0 - 8 £ 5 


n— 1 


= 0.2+ 0.8+ 0.8 f 


a 


n —0 


= 0.2e + 0.8 e . 

Therefore, we get /(0) = P(X = 0) = 0.2 and /(1) = P{X = 1) = 0.8. 
Hence the moment generating function of X is 

M(t) = 0.2 +0.8 e 4 , 

and the probability density function of X is 

\x — 0.2| for x = 0,1 


/ 0 ) = 


.0 


otherwise. 


Example 4.20. If the moment generating function of a random variable X 
is given by 


"<‘> = w‘ + ir !,+ w a ‘ 
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then what is the probability density function of X ? What is the space of the 
random variable XI 

Answer: The moment generating function of X is given to be 

M < ( > = P + P ,+ P* + w 4 ’ + P‘ 

This suggests that A is a discrete random variable. Since X is a discrete 
random variable, by definition of the moment generating function, we see 
that 

M(t)= J2 etx fw 

xGR x 

= e tXl /(arr) + e‘* 2 f(x 2 ) + e tx3 f(x 3 ) + e tx * f(x 4 ) + e tx5 f(x 5 ). 
Hence we have 

f(x i) = /(1) = ^ 

f{x2) = m = A 
f(x a) = /(3) = ^ 
fix 4 ) = /( 4 ) = ^ 

/Os) = /(5) = 

Therefore the probability density function of X is given by 

/( X ) = ~15~ fOT x =1,2,3,4,5 

and the space of the random variable X is 

Rx={ 1 , 2 , 3 , 4 , 5 }. 

Example 4.21. If the probability density function of a discrete random 
variable X is 

f(x)=~r- 2 , for x = 1,2,3, ...,oo, 

7T Z X z 

then what is the moment generating function of X? 
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Answer: If the moment generating function of X exists, then 

OO 

X = 1 



Now we show that the above infinite series diverges if t belongs to the interval 
(— h, h) for any h > 0. To prove that this series is divergent, we do the ratio 
test, that is 

lim ( K+1 = lim 

OO V a n ) 



For any h > 0, since e 4 is not always less than 1 for all t in the interval 
(— h, h), we conclude that the above infinite series diverges and hence for 
this random variable X the moment generating function does not exist. 



Notice that for the above random variable, E [A'"] does not exist for 
any natural number n. Hence the discrete random variable X in Example 
4.21 has no moments. Similarly, the continuous random variable X whose 
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probability density function is 


f(x) = 


\ for 1 < x < oo 

X Z — 

0 otherwise, 


has no moment generating function and no moments. 

In the following theorem we summarize some important properties of the 
moment generating function of a random variable. 

Theorem 4.6. Let X be a random variable with the moment generating 
function M x (t). If a and b are any two real constants, then 


M x+a (t) = e at M x {t) 


M bx (t ) = M x (bt) 


M X + a (t) 


eS ‘ M * (O' 


Proof: First, we prove (4.6). 


M x+a {t) = E (e^ x+a ^ 
= E(e tx+ta ) 
= E(e tx e ta ) 
= e ta E(e tx ) 
= e ta M x (t). 

Similarly, we prove (4.7). 

M bx {t) = E(e^ bx ^ 

= E U* b ) x \ 

= M x (tb). 


By using (4.6) and (4.7), we easily get (4.8). 


M X + a. (t) = Af X + n {/) 

= e T 4 Mx(t) 

b x x 

=eS ‘ Mx (0' 


(4.6) 

(4.7) 

(4.8) 
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This completes the proof of this theorem. 

Definition 4.6. The n th factorial moment of a random variable X is 
E(X(X - 1){X - 2) • • • (X - n + 1)). 

Definition 4.7. The factorial moment generating function (FMGF) of X is 
denoted by G{t) and defined as 

G{t) = E(t x ). 

It is not difficult to establish a relationship between the moment generat¬ 
ing function (MGF) and the factorial moment generating function (FMGF). 
The relationship between them is the following: 

G(t) =E(t x ) = E ( e lntX j =F;(e- Ylnt ) = M(lnt). 

Thus, if we know the MGF of a random variable, we can determine its FMGF 
and conversely. 

Definition 4.8. Let X be a random variable. The characteristic function 
<f>(t) of X is defined as 

0(t) = E(e itx ) 

= E ( cos(tX) + i sin(tX)) 

= E ( cos{tX) ) + iE( sin(tX )). 

The probability density function can be recovered from the characteristic 
function by using the following formula 

1 r°° 

f{x) = — e~ ztx (j>(t) dt. 

^ ^ J —oo 

Unlike the moment generating function, the characteristic function of a 
random variable always exists. For example, the Cauchy random variable X 
with probability density f{x) = has no moment generating function. 

However, the characteristic function is 

=E(e itx ) 

/ oo gitx 

-oo + X 2 ) d2 
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To evaluate the above integral one needs the theory of residues from the 
complex analysis. 

The characteristic function </>(t) satisfies the same set of properties as the 
moment generating functions as given in Theorem 4.6. 

The following integrals 

x m e~ x dx = m\ if m is a positive integer 

and 

\fx e~ x dx = 

are needed for some problems in the Review Exercises of this chapter. These 
formulas will be discussed in Chapter 6 while we describe the properties and 
usefulness of the gamma distribution. 

We end this chapter with the following comment about the Taylor’s se¬ 
ries. Taylor’s series was discovered to mimic the decimal expansion of real 
numbers. For example 

125 = 1 (10 ) 2 + 2 (10 ) 1 + 5 (10)° 

is an expansion of the number 125 with respect to base 10. Similarly, 

125 = 1(9 ) 2 +4(g) 1 + 8(9)° 

is an expansion of the number 125 in base 9 and it is 148. Since given a 
function / : R —> R and x £ R, /( x) is a real number and it can be expanded 
with respect to the base x. The expansion of f(x) with respect to base x will 
have a form 

f(x) = a 0 x° + aix 1 + a 2 x 2 + a^x 3 + ■■• 

which is 

OO 

f(x) = y^ y a k x k . 

k =0 

If we know the coefficients a k for k = 0,1,2,3,..., then we will have the 
expansion of f(x) in base x. Taylor found the remarkable fact that the the 
coefficients a k can be computed if f(x) is sufficiently differentiable. He proved 
that for k = 1,2, 3,... 

/ (fc) ( 0 ) 

k\ 




a k = 


with /<°> = /( 0). 
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4.6. Review Exercises 


1. In a state lottery a five-digit integer is selected at random. If a player 
bets 1 dollar on a particular number, the payoff (if that number is selected) 
is $500 minus the $1 paid for the ticket. Let A equal the payoff to the better. 
Find the expected value of X. 


2. A discrete random variable X has probability density function of the form 

c (8 — x) for x = 0,1,2,3,4,5 

/O) = ' 


l 0 otherwise. 

(a) Find the constant c. (b) Find P(X > 2). (c) Find the expected value 
E(X) for the random variable X. 

3. A random variable X has a cumulative distribution function 

( hx if 0 < x < 1 
F(x) = \ 


[ x — \ if 1 < x < | . 

(a) Graph F(x). (b) Graph f(x). (c) Find P(X < 0.5). (d) Find P(X > 0.5). 
(e) Find P(X < 1.25). (f) Find P(A = 1.25). 


4. Let A be a random variable with probability density function 

! hx for x = 1 , 2 ,5 

0 otherwise. 

(a) Find the expected value of X. (b) Find the variance of X. (c) Find the 
expected value of 2X + 3. (d) Find the variance of 2X + 3. (e) Find the 
expected value of 3X — 5X 2 + 1. 

5. The measured radius of a circle, R, has probability density function 

f 6 r (1 — r) if 0 < r < 1 

/ 0 ) = l 

10 otherwise. 

(a) Find the expected value of the radius, (b) Find the expected circumfer¬ 
ence. (c) Find the expected area. 

6 . Let A be a continuous random variable with density function 

( Ox + | di x 2 for 0 < x < -j= 

f(x) = < 

[ 0 otherwise, 
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where 9 > 0. What is the expected value of XI 

7. Suppose X is a random variable with mean /i and variance a 2 > 0. For 
what value of a, where a > 0 is E f [aX — ^] 2 ^j minimized? 

8 . A rectangle is to be constructed having dimension X by 2X , where X is 
a random variable with probability density function 

f i for 0 < x < 2 

fix) = 

^ 0 otherwise. 

What is the expected area of the rectangle? 

9. A box is to be constructed so that the height is 10 inches and its base 
is X inches by X inches. If X has a uniform distribution over the interval 
[ 2 , 8 ], then what is the expected volume of the box in cubic inches? 

10 . If A is a random variable with density function 

( 1.4 e~ 2x + 0.9 e~ 3x for x > 0 

fix) = < 

[ 0 elsewhere, 

then what is the expected value of XI 

11 . A fair coin is tossed. If a head occurs, 1 die is rolled; if a tail occurs, 2 
dice are rolled. Let X be the total on the die or dice. What is the expected 
value of XI 

12. If velocities of the molecules of a gas have the probability density 
(Maxwell’s law) 

( a v 2 e~ h v for v > 0 

fix) = < 

[ 0 otherwise, 

then what are the expectation and the variance of the velocity of the 
molecules and also the magnitude of a for some given h ? 

13. A couple decides to have children until they get a girl, but they agree to 
stop with a maximum of 3 children even if they haven’t gotten a girl. If X 
and Y denote the number of children and number of girls, respectively, then 
what are E(X) and E(Y)7 

14. In roulette, a wheel stops with equal probability at any of the 38 numbers 
0, 00, 1, 2, ..., 36. If you bet $1 on a number, then you win $36 (net gain is 
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$35) if the number comes up; otherwise, you lose your dollar. What are your 
expected winnings? 

15. If the moment generating function for the random variable X is Mx(t) = 
what is the third moment of X about the point x = 21 

16. If the mean and the variance of a certain distribution are 2 and 8 , what 
are the first three terms in the series expansion of the moment generating 
function? 

17. Let X be a random variable with density function 


( ae ax for x > 0 

f(x) = l 

[ 0 otherwise, 


where a > 0. If M(t) denotes the moment generating function of X , what is 
M(—3a)l 

18. Suppose the random variable X has moment generating 

M(t)= (TVsp for f< y 

What is the n th moment of XI 

19. Two balls are dropped in such a way that each ball is equally likely to 
fall into any one of four holes. Both balls may fall into the same hole. Let X 
denote the number of unoccupied holes at the end of the experiment. What 
is the moment generating function of XI 

20. If the moment generating function of X is M(t) = for t < 1, then 

what is the fourth moment of XI 

21 . Let the random variable X have the moment generating function 

e 3t 

= —1 < * < 1 . 

What are the mean and the variance of X , respectively? 

22 . Let the random variable X have the moment generating function 


M(t) = e 3t+ * 2 . 


What is the second moment of X about x = 0? 
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23. Suppose the random variable X has the cumulative density function 
F(x). Show that the expected value of the random variable (X — c) 2 is 
minimum if c equals the expected value of X. 

24. Suppose the continuous random variable X has the cumulative density 
function F(x). Show that the expected value of the random variable \X — c\ 
is minimum if c equals the median of X (that is, F(c) = 0.5). 

25. Let the random variable X have the probability density function 

f(x) = ^ e - ^ — oo < x < oo. 

What are the expected value and the variance of X ? 

26. If Mx(t ) = k (2 + 3e 4 ) 4 , what is the value of k? 

27. Given the moment generating function of X as 

M(t) = 1 + t + At 2 + 10f 3 + 14f 4 H- 

what is the third moment of X about its mean? 

28. A set of measurements X has a mean of 7 and standard deviation of 0.2. 
For simplicity, a linear transformation Y = aX + b is to be applied to make 
the mean and variance both equal to 1. What are the values of the constants 
a and 6 ? 

29. A fair coin is to be tossed 3 times. The player receives 10 dollars if all 
three turn up heads and pays 3 dollars if there is one or no heads. No gain or 
loss is incurred otherwise. If Y is the gain of the player, what the expected 
value of y? 

30. If X has the probability density function 

fir) — / e ~ X f° r x > 0 
‘ T \ 0 otherwise, 

then what is the expected value of the random variable Y = ei x + 6 ? 

31. If the probability density function of the random variable X if 

f (1 — p) x ~ 1 p if x = 1 ,2,3,..., oo 
/ 0 ) = < 

[ 0 otherwise, 

then what is the expected value of the random variable X -1 ? 
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Chapter 5 

SOME SPECIAL 
DISCRETE 
DISTRIBUTIONS 


Given a random experiment, we can find the set of all possible outcomes 
which is known as the sample space. Objects in a sample space may not be 
numbers. Thus, we use the notion of random variable to quantify the qual¬ 
itative elements of the sample space. A random variable is characterized by 
either its probability density function or its cumulative distribution function. 
The other characteristics of a random variable are its mean, variance and 
moment generating function. In this chapter, we explore some frequently 
encountered discrete distributions and study their important characteristics. 


5.1. Bernoulli Distribution 

A Bernoulli trial is a random experiment in which there are precisely two 
possible outcomes, which we conveniently call ‘failure’ (F) and ‘success’ (S). 
We can define a random variable from the sample space {S, F 1 } into the set 
of real numbers as follows: 


X{F) = 0 


X(S) = 1. 
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The probability density function of this random variable is 


/(0) = P(X = 0) = 1 - p 
/(l) = P(X = l)=p, 

where p denotes the probability of success. Hence 

/(®)=p*(l-p) 1 -*, * = 0,1. 

Definition 5.1. The random variable X is called the Bernoulli random 
variable if its probability density function is of the form 

f(x)=p x (l-p) 1 ~ x , * = 0,1 

where p is the probability of success. 

We denote the Bernoulli random variable by writing X ~ BER(p). 

Example 5.1. What is the probability of getting a score of not less than 5 
in a throw of a six-sided die? 

Answer: Although there are six possible scores {1, 2, 3, 4, 5, 6}, we are 
grouping them into two sets, namely {1, 2, 3, 4} and {5, 6}. Any score in 
{1, 2, 3, 4} is a failure and any score in {5, 6} is a success. Thus, this is a 
Bernoulli trial with 

P(X = 0) = P(failure) = - and P(X = 1) = P(success) = -. 

6 6 

Hence, the probability of getting a score of not less than 5 in a throw of a 
six-sided die is |. 
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Theorem 5.1. If X is a Bernoulli random variable with parameter p, then 
the mean, variance and moment generating functions are respectively given 

by 

x = P 

°x = P (! - P) 

M x (t) = (1 -p)+pe t . 

Proof: The mean of the Bernoulli random variable is 

l 

PX = 5T/( x) 

x—0 

1 

= Y J *p x ^-p) l - x 

x—0 

= p- 

Similarly, the variance of X is given by 

l 

°x = - ^x ) 2 f(x) 

X — 0 

1 

= J2( x -p) 2 p x o-p) 1 ~ x 

x—0 

= p 2 (i -p) +pO~p) 2 
= p0~p) \p+0~p)} 

= p(i-p). 

Next, we find the moment generating function of the Bernoulli random vari¬ 
able 

M(t) = E ( e tx ) 

1 

= Y J z tx p x o~p) l ~ x 

x—0 

= (1 ~P) + eV 

This completes the proof. The moment generating function of X and all the 
moments of X are shown below for p = 0.5. Note that for the Bernoulli 
distribution all its moments about zero are same and equal to p. 
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MGF of X ~ Ber(0.5) 



5.2. Binomial Distribution 

Consider a fixed number n of mutually independent Bernoulli trails. Sup¬ 
pose these trials have same probability of success, say p. A random variable 
X is called a binomial random variable if it represents the total number of 
successes in n independent Bernoulli trials. 

Now we determine the probability density function of a binomial random 
variable. Recall that the probability density function of X is defined as 

f( X ) = P(X = x). 

Thus, to find the probability density function of X we have to find the prob¬ 
ability of x successes in n independent trails. 

If we have x successes in n trails, then the probability of each n-tuple 
with x successes and n — x failures is 

p x (l-p) n ~ x . 

However, there are (") tuples with x successes and n — x failures in n trials. 
Hence 

P(X = x)= Qp x (l-p) n - x . 

Therefore, the probability density function of X is 

f(x) = ^ P x (1 - p) n ~ x , x = 0, 1,..., n. 

Definition 5.2. The random variable X is called the binomial random 
variable with parameters p and n if its probability density function is of the 
form 

f(x) = (”) P x (! ~p) n ~ x , a; = 0,1,..., n 
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where 0 < p < 1 is the probability of success. 


We will denote a binomial random variable with parameters p and n as 
X ~ BIN(n,p). 



Example 5.2. Is the real valued function /(x) given by 


where n and p are parameters, a probability density function? 

Answer: To answer this question, we have to check that /(x) is nonnegative 
and E”=o f( x ) 1- ^ ^ eas y to see that /(x) > 0. We show that sum is 

one. 

n n / \ 

X=0 x=0 ^ / 

= (p +1 -v) n 

= l. 

Hence /(x) is really a probability density function. 

Example 5.3. On a five-question multiple-choice test there are five possible 
answers, of which one is correct. If a student guesses randomly and indepen¬ 
dently, what is the probability that she is correct only on questions 1 and 
4? 

Answer: Here the probability of success is p — l , and thus 1 — p = |. 
Therefore, the probability that she is correct on questions 1 and 4 is 


P(correct on questions 1 and 4) =p 2 { 1 — p) 3 



64 

55 


0.02048. 
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Example 5.4. On a five-question multiple-choice test there are five possible 
answers, of which one is correct. If a student guesses randomly and indepen¬ 
dently, what is the probability that she is correct only on two questions? 

Answer: Here the probability of success is p = l, and thus 1 — p = |. There 
are (®) different ways she can be correct on two questions. Therefore, the 
probability that she is correct on two questions is 


P(correct on two questions) = 



P 2 (l-P) 3 


= 10 




3 


640 

~W 


0.2048. 


Example 5.5. What is the probability of rolling two sixes and three nonsixes 
in 5 independent casts of a fair die? 


Answer: Let the random variable A denote the number of sixes in 5 in¬ 
dependent casts of a fair die. Then A is a binomial random variable with 
probability of success p and n = 5. The probability of getting a six is p = g. 
Hence 


P{X = 2) 





3 


= 10 



125\ 
216 ) 


1250 

7776 


0.160751. 


Example 5.6. What is the probability of rolling at most two sixes in 5 
independent casts of a fair die? 

Answer: Let the random variable X denote number of sixes in 5 independent 
casts of a fair die. Then A is a binomial random variable with probability 
of success p and n = 5. The probability of getting a six is p = g. Hence, the 
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probability of rolling at most two sixes is 
P(X < 2) = F(2) = f(0) + /(l) + /(2) 



= - (0.9421 + 0.9734) = 0.9577 (from binomial table) 


0.5 

PDF of X~BIN(5, 1/6) 

0 4 < 

t • 
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Theorem 5.2. If X is binomial random variable with parameters p and n, 
then the mean, variance and moment generating functions are respectively 
given by 

px = np 
4 = np(l-p) 

M x (t) = [(1 -p)+p e *] n . 

Proof: First, we determine the moment generating function M(t) of X and 
then we generate mean and variance from M (f). 

M(t) = E (e tx ) 

= j2 etx p x o~-p) n ~ x 

x=0 ' X ' 

= Z(") (>>‘T a -p)"- 

x—0 ^ ' 

= (pe‘ + 1 -p) n . 

Hence 

M'(t) = n (p e* + 1 — p) n 1 p e 4 . 
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Therefore 

Hx = M\ 0) = np. 

Similarly 

M"(t) = n (p e* + 1 — p) " 1 p e* + n (n — 1) (p e* + 1 — p) " 2 (p e*) “ . 
Therefore 

£(A 2 ) = M"(0) = n (n - l)p 2 + np. 

Hence 

Var(X) = E(X 2 ) — p 2 x = n(n — 1) p 2 + np — n 2 p 2 = np (1 — p). 

This completes the proof. 

Example 5.7. Suppose that 2000 points are selected independently and at 
random from the unit squares S = {( x,y ) | 0 < x,y < 1}. Let X equal the 
number of points that fall in A = {{x, y ) | x 2 +y 2 < 1}. How is X distributed? 
What are the mean, variance and standard deviation of X? 

Answer: If a point falls in A, then it is a success. If a point falls in the 
complement of A, then it is a failure. The probability of success is 

area of A 1 

P = - rw = t 7i- 

area of S 4 


Regions S and A 



Since, the random variable represents the number of successes in 2000 inde¬ 
pendent trials, the random variable A is a binomial with parameters p = 
and n = 2000, that is X ~ BIN(2000, |). 
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Hence by Theorem 5.2, 

p x = 2000 ^ = 1570.8, 

and 

ox = 2000 (l-0f= 337.1. 
The standard deviation of A is 

a x = t/337.1 = 18.36. 


Example 5.8. Let the probability that the birth weight (in grams) of babies 
in America is less than 2547 grams be 0.1. If X equals the number of babies 
that weigh less than 2547 grams at birth among 20 of these babies selected 
at random, then what is P(X < 3)? 

Answer: If a baby weighs less than 2547, then it is a success; otherwise it is 
a failure. Thus A is a binomial random variable with probability of success 
p and n = 20. We are given that p = 0.1. Hence 


P(X < 3) = 



/ 1 \ k /9 \ 

^io y ^Toy 


= 0.867 (from table). 


0.3 

0.25 

PDF of X~BIN(20, 0.1) 
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Example 5.9. Let X\, X^, A 3 be three independent Bernoulli random vari¬ 
ables with the same probability of success p. What is the probability density 
function of the random variable A = Ai + A 2 + A 3 ? 

Answer: The sample space of the three independent Bernoulli trials is 
S = {FFF, FFS , FSF, SFF, FSS, SFS, SSF, SSS}. 
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The random variable X = X\ + X 2 + X 3 represents the number of successes 
in each element of S. The following diagram illustrates this. 



Let p be the probability of success. Then 
/( o) = P(x = 0) = P(FFF) = (1 - p) 3 

/(l) = P(X = 1) = P(FFS) + P(FSF) + P(SFF) = 3p{l-p ) 2 
/(2) = P(X = 2) = P(FSS) + P(SFS) + P(SSF) =3p 2 {l-p) 
/(3) = P(X = 3) = P(SSS) = p 3 . 


Hence 


Thus 


/w = Q/(i-p) 3 - 


x = 0, 1, 2, 3. 


X~BIN(3,p). 


In general, if X t ~ BER(p), then J2i= 1 ^ ~ BIN(n,p) and hence 


£ (i>) 


np 


and 

Var =np(l-p). 

The binomial distribution can arise whenever we select a random sample 
of n units with replacement. Each unit in the population is classified into one 
of two categories according to whether it does or does not possess a certain 
property. For example, the unit may be a person and the property may be 
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whether he intends to vote “yes”. If the unit is a machine part, the property 
may be whether the part is defective and so on. If the proportion of units in 
the population possessing the property of interest is p, and if Z denotes the 
number of units in the sample of size n that possess the given property, then 

Z ~ BIN(n, p). 


5.3. Geometric Distribution 

If X represents the total number of successes in n independent Bernoulli 
trials, then the random variable 

X ~ BIN(n,p), 

where p is the probability of success of a single Bernoulli trial and the prob¬ 
ability density function of X is given by 

/(x)=(”)/(l- P r, x = 0, 1 ,..., n. 

Let X denote the trial number on which the first success occurs. 



Hence the probability that the first success occurs on x th trial is given by 


/( x) = P{X = x) = (1 - p) x 1 p. 


Hence, the probability density function of X is 


f(x) = (! ~p) x 1 P x= 1, 2, 3,..., oo, 


where p denotes the probability of success in a single Bernoulli trial. 
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Definition 5.3. A random variable X has a geometric distribution if its 
probability density function is given by 

f(x) = (l-p) x ~ 1 p x = 1, 2, 3,..., oo, 

where p denotes the probability of success in a single Bernoulli trial. 

If X has a geometric distribution we denote it as X ~ GEO{p). 



PDF of X-GEO(O.l) 
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Example 5.10. Is the real valued function f(x) defined by 
f(x) = (1 - p) x_1 p x = 1, 2, 3, oo 

where 0 < p < 1 is a parameter, a probability density function? 

Answer: It is easy to check that f(x) > 0. Thus we only show that the sum 
is one. 

OO OO 

j2f(x) = J2( 1 -p) x ~ 1 p 

X=1 X=1 

oo 

= p ^^(1 — p) v , where y = x — 1 

y=0 

= P 1-(1 - p ) 

Hence f(x) is a probability density function. 

Example 5.11. The probability that a machine produces a defective item 
is 0.02. Each item is checked as it is produced. Assuming that these are 
independent trials, what is the probability that at least 100 items must be 
checked to find one that is defective? 
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Answer: Let X denote the trial number on which the first defective item is 
observed. We want to find 

OO 

P(X > 100) = E /Or) 

£=100 

OO 

= E IX-pY^p 
£=100 

OO 

= (i -p) 99 ^2x-p) y p 

y=o 

= (1 -P)" 

= (0.98)" = 0.1353. 

Hence the probability that at least 100 items must be checked to find one 
that is defective is 0.1353. 

Example 5.12. A gambler plays roulette at Monte Carlo and continues 
gambling, wagering the same amount each time on “Red”, until he wins for 
the first time. If the probability of “Red” is and the gambler has only 
enough money for 5 trials, then (a) what is the probability that he will win 
before he exhausts his funds; (b) what is the probability that he wins on the 
second trial? 


Answer: 


1 Q 

p = P( Red ) = 


(a) Hence the probability that he will win before he exhausts his funds is 
given by 

P(X < 5) = 1 - P(X > 6 ) 

= 1-(1 ~Pf 


= 1 - 1 - 


18 
38, 

= 1 - (0.5263) 5 = 1 - 0.044 = 0.956. 

(b) Similarly, the probability that he wins on the second trial is given by 
P{X = 2) = /(2) 


= (1 - P) 


2-1 


P 


18 


= 1 -^ ^ 


38 


18 


38 


360 

1444 


= 0.2493. 
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The following theorem provides us with the mean, variance and moment 
generating function of a random variable with the geometric distribution. 

Theorem 5.3. If X is a geometric random variable with parameter p, then 
the mean, variance and moment generating functions are respectively given 

by 


Px 


<?x 


M x (t ) 


1 

V 

1 -P 

p 2 

pe* 

1 — (1 — p)e t ' 


if t < —ln( 1 — p). 


Proof: First, we compute the moment generating function of X and then 
we generate all the mean and variance of X from it. 

OO 

M(t) = J2e tx (l-p) x - 1 P 

X=1 

OO 

= p ^2 e t( ' v+1 ' ) (1 — p) v , where y = x — 1 
v =o 

OO 

= pe* ^(e‘(l -p)) y 
y =o 
pe* 

1 — (1 — p) e* ’ 


if t < —ln( 1 — p). 
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Differentiating M ( t ) with respect to t, we obtain 

, _ (1 - (1 -p) e l ) pP+pe* (1 - p) e* 

[l-(l-p)ef 

p e* [1 — (1 — p ) e* + (1 — p) e‘] 

[1 - (1 -p)e 4 ] 2 

pe* 

[1 - (1 -p)e 4 ] 2 

Hence 

p x = E{X) = M'{ 0) = -. 

P 

Similarly, the second derivative of M ( t ) can be obtained from the first deriva¬ 
tive as 

M „ = [l-{l-p)e t } 2 pe t +pe t 2[l~{l-p)e t } (l-p)e* 

[l-(l-p)ef 

Hence 

M"( 0) = y3 + V 4 (1 ~ p) = 

^4 p2 

Therefore, the variance of X is 

a 2 x = M"(0) - (M'(0)) 2 

_ 2 -p 1 

p 2 p 2 

= 1 ~P 

pi 

This completes the proof of the theorem. 

Theorem 5.4. The random variable X is geometric if and only if it satisfies 
the memoryless property, that is 

P(X > m + n / X > n) = P(X > m) 

for all natural numbers n and m. 

Proof: It is easy to check that the geometric distribution satisfies the lack 
of memory property 

P(X > m + n / X > n) = P(X > m) 
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which is 


P(X > to + n and X > n) = P{X > m) P(X > n). (5.1) 

If X is geometric, that is X ~ (1 — pY -1 p, then 

OO 

P(X>n + m)= ^2 (1 — p) x ~ x p 

x=n-\-m -\-1 

= (l - P ) n+m 
= P(X > n) P(X > m). 

Hence the geometric distribution has the lack of memory property. Let X be 
a random variable which satisfies the lack of memory property, that is 

P(X > m + n and X > n) = P(X > m) P(X > n). 

We want to show that X is geometric. Define g : N —> 1R. by 

g(n) := P(X > n ) (5.2) 

Using (5.2) in (5.1), we get 

g(m + n) = g(m) g(n) Vm,ngN, (5.3) 

since P(X > m + n and X > n) = P(X > m + n). Letting m = 1 in (5.3), 
we see that 

g(n + 1) = g(n)g( 1) 

= g(n- l)ff(l) 2 

= g{n - 2) g(l) 3 
= g(n - (n - 1)) g(l) n 

= g{ i) n+1 

= a n+1 , 

where a is an arbitrary constant. Hence g(n) = a". From (5.2), we get 
1 - F(n) = P(X > n) = a n 
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and thus 

F(n) = 1 -a n . 

Since F(n) is a distribution function 

1 = lim F(n) = lim (1 — a n ) . 

n —kx) n —>-oo 

From the above, we conclude that 0 < a < 1. We rename the constant a as 
(1 — p). Thus, 

F(n) = l-{l-p) n . 

The probability density function of X is hence 
f(l)=F(l)=p 

/(2) = F( 2) - F(l) = 1 - (1 -p) 2 - 1 + (1 -p) = (1 -p)p 
/(3) = F( 3) - F( 2) = 1 - (1 - pf - 1 + (1 - pf = (1 - pfp 

f(x) = F(x) - F(x - 1) = (1 - p) x ~ x p. 

Thus X is geometric with parameter p. This completes the proof. 

The difference between the binomial and the geometric distributions is 
the following. In binomial distribution, the number of trials was predeter¬ 
mined, whereas in geometric it is the random variable. 

5.4. Negative Binomial Distribution 

Let X denote the trial number on which the r th success occurs. Here r 
is a positive integer greater than or equal to one. This is equivalent to saying 
that the random variable X denotes the number of trials needed to observe 
the r th successes. Suppose we want to find the probability that the fifth head 
is observed on the 10 th independent flip of an unbiased coin. This is a case 
of finding P(X = 10). Let us find the general case P(X = x). 

P(X = x) = P( firsts — 1 trials contain a: — r failures and r — 1 successes) 

• P(r th success in x th trial) 

= (* Z jy-'a - p)- t p 

= (1 - p)*- r , x = r,r + l,...,co. 
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Hence the probability density function of the random variable X is given by 


f(x) = 


x — 1 
r — 1 


P r (i- P y 


x = r, r + 1 ,oo. 


Notice that this probability density function f(x) can also be expressed as 


f(x) = 


x + r — 1 
r — 1 


P r (i- P y 


x = 0, 1 ,oo. 



Definition 5.4. A random variable X has the negative binomial (or Pascal) 
distribution if its probability density function is of the form 

f( x ) = l ^\P r (1 -P) x ' r , x = r, r + 1,..., oo, 

where p is the probability of success in a single Bernoulli trial. We denote 
the random variable X whose distribution is negative binomial distribution 
by writing X ~ NBIN(r,p). 



We need the following technical result to show that the above function 
is really a probability density function. The technical result we are going to 
establish is called the negative binomial theorem. 
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Theorem 5.5. Let r be a nonzero positive integer. Then 

x=r ' ' 

where \y\ < 1. 

Proof: Define 

Kv) = (! - y) _r - 

Now expanding h(y) by Taylor series method about y = 0, we get 


(i -y)~ r 


^ k\ y ' 


where h^ k \y) is the k th derivative of h. This k th derivative of h(y) can be 
directly computed and direct computation gives 


h^ (y) = r (r + 1) (r + 2) • • ■ (r + k — 1) (1 — y) < - r+k '>. 


Hence, we get 

h ^ (0) = r (r + 1) (r + 2) • • • (r + k - 1) = ^ + ^ ^' 

(r- 1)! 

Letting this into the Taylor’s expansion of h(y), we get 


(i -«)- = £ y" 


= E 

k=0 


r + k — 1 
r — 1 


/• 


Letting x = k + r, we get 




This completes the proof of the theorem. 

Theorem 5.5 can also be proved using the geometric series 


OO 


E^ 

n=0 


l 

i -y 


(5.4) 
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where \y\ < 1. Differentiating k times both sides of the equality (5.4) and 
then simplifying we have 


E 

n—k 


y n ~ k = 


kj° (1 — y ) k+1 ' 


(5.5) 


Letting n = x — 1 and k = r — 1 in (5.5), we have the asserted result. 
Example 5.13. Is the real valued function defined by 


«*> = ((_ IV 


x = r, r + 1,..., oo, 


where 0 < p < 1 is a parameter, a probability density function? 

OO 

Answer: It is easy to check that f(x) > 0. Now we show that \^f(x) is 

x=r 

equal to one. 

OO OO / \ 

E/w = E(!l,K< 1 -*')" 


x=r x=r 


=p'E( x -~l)a-pr 


r — 1 


= p r p)Y 


T —T 

= p r p 


= 1 . 


Computing the mean and variance of the negative binomial distribution 
using definition is difficult. However, if we use the moment generating ap¬ 
proach, then it is not so difficult. Hence in the next example, we determine 
the moment generating function of this negative binomial distribution. 

Example 5.14. What is the moment generating function of the random 
variable X whose probability density function is 

f(x) = l \p r (1 -p) x ~ r , x = r, r + 1,..., oo? 


Answer: The moment generating function of this negative binomial random 
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variable is 


M{t) = Y,e tx f{x) 


x=r 

oo 

= !>'• 
x=r 

oo 


X — 1 

r — 1 




0 t(x—r) tr ( X 1 


r — 1 


(l-p) a 


=? r I 1 


= p r e tr J2 


r — 1 


x — 1 


0 t(x—r) 


(i- P y 


e 4 (1 - p)]' 


r — 1 

= e * r [l~(l-p)e 4 ] 

pe 4 

1 — (1 — p)e t 
The following theorem can easily be proved. 
Theorem 5.6. If X ~ NBIN(r,p), then 

Em = l 


if t < —ln{ 1 — p). 


Var(X) = difh 


M(t) = 


pe 


1 -( 1 -P)e 4 / 


if t < —ln{ 1 — p). 


Example 5.15. What is the probability that the fifth head is observed on 
the 10 th independent flip of a coin? 

Answer: Let X denote the number of trials needed to observe 5 th head. 
Hence X has a negative binomial distribution with r = 5 and p= 

We want to find 


P{X = 10) = /(10) 


= Qp 5 (i -p? 



63 

512' 
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We close this section with the following comment. In the negative bino¬ 
mial distribution the parameter r is a positive integer. One can generalize 
the negative binomial distribution to allow noninteger values of the parame¬ 
ter r. To do this let us write the probability density function of the negative 
binomial distribution as 


/ 0 ) = 


x — 1 
r — 1 

(r — 1)! (x — r)! 
T(x) 


r(r) T(a; — r — 1) 


p r (l- P r~ r 

P r (l-P) x ~ 
P r (i- P Y 


for x = r,r + 1,..., oo, 


where 

/-•OO 

r(z) = / t z ~ 1 e~ t dt 
Jo 

is the well known gamma function. The gamma function generalizes the 
notion of factorial and it will be treated in the next chapter. 

5.5. Hypergeometric Distribution 

Consider a collection of n objects which can be classified into two classes, 
say class 1 and class 2. Suppose that there are n i objects in class 1 and n -2 
objects in class 2. A collection of r objects is selected from these n objects 
at random and without replacement. We are interested in finding out the 
probability that exactly x of these r objects are from class 1. If a: of these r 
objects are from class 1, then the remaining r — x objects must be from class 
2. We can select x objects from class 1 in any one of ("') ways. Similarly, 
the remaining r — x objects can be selected in ( " 2 ) ways. Thus, the number 
of ways one can select a subset of r objects from a set of n objects, such that 
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x number of objects will be from class 1 and r — x number of objects will be 
from class 2, is given by 1 ( ” 2 ). Hence, 


P(X = x) 




where x < r, x <n\ and r — x < U 2 - 



From 
n1+n2 
objects 
select r 
objects 
such that x 
objects are 
of class I & 
r-x are of 
class II 


Definition 5.5. A random variable X is said to have a hypergeometric 
distribution if its probability density function is of the form 


f(x) 


" 2 

\r—xJ 


(n 1 +n 2 \ 


x = 0,1,2,..., r 


where x < n i and r — x < with m and n-i being two positive integers. We 
shall denote such a random variable by writing 


X ~ HYP(n 1 ,n 2 ,r). 


Example 5.16. Suppose there are 3 defective items in a lot of 50 items. A 
sample of size 10 is taken at random and without replacement. Let X denote 
the number of defective items in the sample. What is the probability that 
the sample contains at most one defective item? 

Answer: Clearly, X ~ HYP( 3, 47, 10). Hence the probability that the 
sample contains at most one defective item is 

P(X < 1) = P(X = 0) + P{X = 1) 



= 0.504+0.4 
= 0.904. 
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Example 5.17. A random sample of 5 students is drawn without replace¬ 
ment from among 300 seniors, and each of these 5 seniors is asked if she/he 
has tried a certain drug. Suppose 50% of the seniors actually have tried the 
drug. What is the probability that two of the students interviewed have tried 
the drug? 


Answer: Let X denote the number of students interviewed who have tried 
the drug. Hence the probability that two of the students interviewed have 
tried the drug is 


P{X = 2) 



= 0.3146. 


0.4 

PDF of X~HYP(150,150,5) 


0.35 

0 . 3 

■ * 


0.25 

f(jO 


0.2 



0.15 

* ■ 


0.05 


• 
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5 


Example 5.18. A radio supply house has 200 transistor radios, of which 
3 are improperly soldered and 197 are properly soldered. The supply house 
randomly draws 4 radios without replacement and sends them to a customer. 
What is the probability that the supply house sends 2 improperly soldered 
radios to its customer? 

Answer: The probability that the supply house sends 2 improperly soldered 





Some Special Discrete Distributions 


132 


radios to its customer is 


P(X = 2) = 


(IHT) 

( 7 ) 


= 0.000895. 


Theorem 5.7. If X ~ HYP(ni,n 2 ,r), then 


E{ X) = r 
Var(X) = r 


m 


ni + ?r 2 
Hi 

n\ + n 2 


n 2 

ni + n 2 


ni + n 2 — r 
ni + n 2 — 1 


Proof: Let X ~ HYP(ni,n 2 ,r). We compute the mean and variance of 
X by computing the first and the second factorial moments of the random 
variable X. First, we compute the first factorial moment (which is same as 
the expected value) of X. The expected value of X is given by 

r 

E{X) = '£ / xf(x) 


x—0 


= E : 

x=0 


r 1 ” 2 

\ x J \r—xJ 
(n 1 +n 2 \ 




(ni - 1)! 


(” 2 ) 
\r—xJ 


x ^i ( x ~ 1)! ( n i — *)! ( ni 7 2 ) 


= r, i E 


(" 1_1 ) ( ™ 2 ) 
V x— 1 / \r—x) 


= r - 


Hi-t-rea //*!•: 2 1\ 

x=l r V r—1 / 

r—1 /m —1\ ( n 2 \ 
n l V V ) \r-l-y) 


n i + n 2 
ni 


E 

y=o 


^ni+n 2 -l^ 


where y = x — 1 


ni + n 2 

The last equality is obtained since 


r—1 frai —1\ ( n2 ) 

l y ) \r—l — y) _ ^ 

y=0 


/m+n 2 -l\ 


Similarly, we find the second factorial moment of X to be 

r(r — 1 ) ni (ni — 1 ) 


E(X(X - 1)) = 


(ni + n 2 ) (n! + n 2 - 1 )' 
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Therefore, the variance of X is 


Var(X) = E{X 2 ) - E{X) 2 

= E(X(X - 1)) + E(X) - E(X) 2 

r(r — 1 ) n\ (ni — 1 ) n\ / n\ 

(ni + n 2 ) (ni + n 2 — 1) n\ + n 2 \ m + ?r 2 


ni \ 
ni + n 2 / 


/ n 2 \ / ni + n 2 ~ r \ 
\ni + n 2 y \ni + n 2 - l) 


5.6. Poisson Distribution 

In this section, we define an important discrete distribution which is 
widely used for modeling many real life situations. First, we define this 
distribution and then we present some of its important properties. 

Definition 5.6. A random variable X is said to have a Poisson distribution 
if its probability density function is given by 

e -A \ x 

f(x )=~—j—, z = 0, 1, 2, - • ■, oo, 

x\ 


where 0 < A < oo is a parameter. We denote such a random variable by 
X ~ POI(X). 



The probability density function / is called the Poisson distribution after 
Simeon D. Poisson (1781-1840). 

Example 5.19. Is the real valued function defined by 

e~ x \ x 

f(x )=~—, z = 0, 1, 2, - - ■, oo, 
x\ 


where 0 < A < oo is a parameter, a probability density function? 
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Answer: It is easy to check f(x) > 0. We show that is equal to 


one. 


a;—0 


J2.f(x) = J2 

x=0 x=0 


e~ x A a 


x\ 


— X 


\ x 


= e "E 

x—0 

= e ~ x e x = 1 


x\ 


Theorem 5.8. If X ~ POI(X), then 


E{X) = A 
Var(X) = A 

M(t) = e A(e ‘- 1) . 


Proof: First, we find the moment generating function of X. 


Thus, 

and 

Similarly, 

Hence 


A m = Y j e tx f(.x) 

x—0 


= I>“ 

x—0 


e~ x X x 

x\ 


= e 


-A 


x=0 


W 

x\ 


= e 


oo 

- A E 

x=0 


(e^r 

a:! 


= e 
= e 


-A e Ae‘ 

A(e*-1) 


M'(t) = \ e t e A(e * _1) , 

£(X) = Af'(O) = A. 

M"(f) = Ae t e A(e ‘- 1) + (Ae*) 2 e A(e ‘" 1) . 


M"( 0) = £(X 2 ) = A 2 + A. 
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Therefore 


Var{X) = E{X 2 ) - ( E(X) ) 2 = A 2 + A - A 2 = A. 


This completes the proof. 

Example 5.20. A random variable X has a Poisson distribution with a 
mean of 3. What is the probability that X is bounded by 1 and 3, that is, 
P( 1 < X < 3)? 


Answer: 


Hence 


Therefore 


H .v = 3 = A 


f(x) = 


\ x e~ x 
x\ 


fix) = 


3 X e~ 3 


x = 0, 1, 2, ... 


P(1 <X<3) = /(1) + /(2) + /(3) 

27 


= 3 e -3 -f 
= 12 e" 3 . 


9 

2 6 


-3 


-3 



Example 5.21. The number of traffic accidents per week in a small city 
has a Poisson distribution with mean equal to 3. What is the probability of 
exactly 2 accidents occur in 2 weeks? 

Answer: The mean traffic accident is 3. Thus, the mean accidents in two 
weeks are 


A = (3) (2) = 6. 
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Since 


we get 




Example 5.22. Let X have a Poisson distribution with parameter A = 1. 
What is the probability that X >2 given that X < 4? 


Answer: 


Similarly 


P(X > 2/X < 4) 
P{2 < X < 4) 


P(2 <X<A) 
P{X < 4) 


A A 3 -' e~ x 

x\ 

x=1 



17 

24e' 


P(X < 4) = 1 
e 


4 


E 

x=0 


l 

x\ 


65 

24e' 


Therefore, we have 


P{X> 2/X<4)= 

DO 


Example 5.23. If the moment generating function of a random variable X 
is M(t) = e 4 ' 6 ( e _1 ), then what are the mean and variance of X? What is 
the probability that X is between 3 and 6, that is P(3 < X <6)7 
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Answer: Since the moment generating function of X is given by 

M(t) = e 4 - 6 ( e ‘- 1 ) 

we conclude that X ~ POI( A) with A = 4.6. Thus, by Theorem 5.8, we get 

E(X) = 4.6 = Var(X). 

P(3 < X < 6) = /(4) + /(5) 

= ^(5) - F( 3) 

= 0.686 - 0.326 
= 0.36. 


5.7. Riemann Zeta Distribution 


The zeta distribution was used by the Italian economist Vilfredo Pareto 
(1848-1923) to study the distribution of family incomes of a given country. 

Definition 5.7. A random variable X is said to have Riemann zeta distri¬ 
bution if its probability density function is of the form 

f(x) = 1 3T (a+1) , x = 1,2,3,. ..,oo 

C(« + l) 

where a > 0 is a parameter and 


CM = i + 



s 

+ 





is the well known the Riemann zeta function. A random variable having a 
Riemann zeta distribution with parameter a will be denoted by X ~ RIZ(a). 


The following figures illustrate the Riemann zeta distribution for the case 
a = 2 and a = 1. 
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The following theorem is easy to prove and we leave its proof to the reader. 
Theorem 5.9. If X ~ RIZ(a), then 


E{X) = 


Var(X) = 


C(oQ 

C( a +1) 

C(a - l)C(a+ 1) - (C(a)) 2 
(C(«+l)) 2 


Remark 5.1. If 0 < a < 1, then ((a) = oo. Hence if X ~ RIZ(a) and the 
parameter a < 1, then the variance of X is infinite. 

5.8. Review Exercises 

1. What is the probability of getting exactly 3 heads in 5 flips of a fair coin? 

2. On six successive flips of a fair coin, what is the probability of observing 
3 heads and 3 tails? 

3. What is the probability that in 3 rolls of a pair of six-sided dice, exactly 
one total of 7 is rolled? 

4. What is the probability of getting exactly four “sixes” when a die is rolled 
7 times? 

5. In a family of 4 children, what is the probability that there will be exactly 
two boys? 

6. If a fair coin is tossed 4 times, what is the probability of getting at least 
two heads? 

7. In Louisville the probability that a thunderstorm will occur on any day 
during the spring is 0.05. Assuming independence, what is the probability 
that the first thunderstorm occurs on April 5? (Assume spring begins on 
March 1.) 

8. A ball is drawn from an urn containing 3 white and 3 black balls. After 
the ball is drawn, it is then replaced and another ball is drawn. This goes on 
indefinitely. What is the probability that, of the first 4 balls drawn, exactly 
2 are white? 

9. What is the probability that a person flipping a fair coin requires four 
tosses to get a head? 

10 . Assume that hitting oil at one drilling location is independent of another, 
and that, in a particular region, the probability of success at any individual 
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location is 0.3. Suppose the drilling company believes that a venture will 
be profitable if the number of wells drilled until the second success occurs 
is less than or equal to 7. What is the probability that the venture will be 
profitable? 

11. Suppose an experiment consists of tossing a fair coin until three heads 
occur. What is the probability that the experiment ends after exactly six 
flips of the coin with a head on the fifth toss as well as on the sixth? 

12. Customers at Fred’s Cafe wins a $100 prize if their cash register re¬ 
ceipts show a star on each of the five consecutive days Monday, Tuesday, ..., 
Friday in any one week. The cash register is programmed to print stars on 
a randomly selected 10% of the receipts. If Mark eats at Fred’s Cafe once 
each day for four consecutive weeks, and if the appearance of the stars is 
an independent process, what is the probability that Mark will win at least 
$ 100 ? 

13. If a fair coin is tossed repeatedly, what is the probability that the third 
head occurs on the n th toss? 

14. Suppose 30 percent of all electrical fuses manufactured by a certain 
company fail to meet municipal building standards. What is the probability 
that in a random sample of 10 fuses, exactly 3 will fail to meet municipal 
building standards? 

15. A bin of 10 light bulbs contains 4 that are defective. If 3 bulbs are chosen 
without replacement from the bin, what is the probability that exactly k of 
the bulbs in the sample are defective? 

16. Let X denote the number of independent rolls of a fair die required to 
obtain the first “3”. What is P(X > 6)? 

17. The number of automobiles crossing a certain intersection during any 
time interval of length t minutes between 3:00 P.M. and 4:00 P.M. has a 
Poisson distribution with mean t. Let W be time elapsed after 3:00 P.M. 
before the first automobile crosses the intersection. What is the probability 
that W is less than 2 minutes? 

18. In rolling one die repeatedly, what is the probability of getting the third 
six on the X th roll? 

19. A coin is tossed 6 times. What is the probability that the number of 
heads in the first 3 throws is the same as the number in the last 3 throws? 
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20. One hundred pennies are being distributed independently and at random 

into 30 boxes, labeled 1, 2, 30. What is the probability that there are 

exactly 3 pennies in box number 1? 

21. The density function of a certain random variable is 

r (ID (0.2) 4 *(0.8) 22 - 4 " if 1 = 0, l f 
/(*) = s 

0 otherwise. 

What is the expected value of X 2 ? 

22. If Mx(t) = k (2 + Se*) 100 , what is the value of k7 What is the variance 
of the random variable X ? 

23. If Mx(t) = k ^ 7 _f 5e , ^ , what is the value of fc? What is the variance of 
the random variable X ? 

24. If for a Poisson distribution 2/(0) + /(2) = 2/(1), what is the mean of 
the distribution? 

25. The number of hits, X, per baseball game, has a Poisson distribution. 
If the probability of a no-hit game is |, what is the probability of having 2 
or more hits in specified game? 

26. Suppose X has a Poisson distribution with a standard deviation of 4. 
What is the conditional probability that X is exactly 1 given that X > 1 ? 

27. A die is loaded in such a way that the probability of the face with j dots 
turning up is proportional to j 2 for j = 1,2,3,4,5,6. What is the probability 
of rolling at most three sixes in 5 independent casts of this die? 

28. A die is loaded in such a way that the probability of the face with j dots 
turning up is proportional to j 2 for j = 1, 2,3,4,5,6. What is the probability 
of getting the third six on the 7 th roll of this loaded die? 
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Chapter 6 

SOME SPECIAL 
CONTINUOUS 
DISTRIBUTIONS 


In this chapter, we study some well known continuous probability density 
functions. We want to study them because they arise in many applications. 
We begin with the simplest probability density function. 

6.1. Uniform Distribution 

Let the random variable X denote the outcome when a point is selected 
at random from an interval [a, b\. We want to find the probability of the 
event X < x, that is we would like to determine the probability that the 
point selected from [a, b] would be less than or equal to x. To compute this 
probability, we need a probability measure /i that satisfies the three axioms of 
Kolmogorov (namely nonnegativity, normalization and countable additivity). 
For continuous variables, the events are interval or union of intervals. The 
length of the interval when normalized satisfies all the three axioms and thus 
it can be used as a probability measure for one-dimensional random variables. 
TTpticp 

< = length of [a,x] 

— length of [a, b] 

Thus, the cumulative distribution function F is 

F{x) = P(X < x) = ^——, a < x < b, 

where a and b are any two real constants with a < b. To determine the 
probability density function from cumulative density function, we calculate 
the derivative of F(x). Hence 

fix) = Fix ) = —'-—, a < x < b. 

w dx w b- a ~ ~ 
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Definition 6.1. A random variable X is said to be uniform on the interval 
[a, b] if its probability density function is of the form 

f{x) = —, a < x < b, 

b — a 

where a and b are constants. We denote a random variable X with the 
uniform distribution on the interval [a, b] as X ~ UNIF(a, b ). 


PDF of a Uniform Random Variable on [2,3] 
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The uniform distribution provides a probability model for selecting points 
at random from an interval [a, b]. An important application of uniform dis¬ 
tribution lies in random number generation. The following theorem gives 
the mean, variance and moment generating function of a uniform random 
variable. 

Theorem 6.1. If X is uniform on the interval [a, b] then the mean, variance 
and moment generating function of X are given by 


E(X) = 

b + a 

2 


Var(X) = 

(b — a) 2 

12 




if t = 0 

M(t) = 

| e tb -e ta 

if t 7 ^ 0 


\ t ( b—a ) 5 


E(X)= f xf(x)dx 



1 

b — a 


dx 


b-< 


x 

T 




b 

a 


Proof: 
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r b 

E(X 2 ) = / x 2 f(x) dx 

J a 


= / X 
1 

1 

b — a 


b — 


dx 


x° 

y 


b — a 3 

1 (6 — a) (b 2 + ba + a 2 ) 

(b — a) 3 

= ^ (b 2 + ba + a 2 ). 

Hence, the variance of X is given by 

Var(X) = E{X 2 ) ~{E(X) ) 2 

= i (t 2 + 6a+( ,>)_£±fli 

= -t [46 2 + 4ba + 4 a 2 - 3 « 2 - 36 2 - 66 o] 


1 

12 

1 


= — [b 2 - 2 ba + a 2 


= 12 ib - a) ■ 

Next, we compute the moment generating function of X. First, we handle 
the case t ^ 0. Assume t ^ 0. Hence 


M(t) = E (e tx ) 

rb 


l 


= / e 


b — a 

- ~tx "I b 


dx 


b — a 


e tb - e ta 


t 


t(b — a) 

If t = 0, we have know that M(0) = 1, hence we get 

1 if t = 0 


M(t) = 


t (b—a) ’ if i 7^ 0 
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and this completes the proof. 


MGF of X~UNIF(2, 3) 



Example 6.1. Suppose Y ~ UNIF(0,1) and Y = \X 2 . What is the 
probability density function of X? 

Answer: We shall find the probability density function of X through the 
cumulative distribution function of Y. The cumulative distribution function 
of X is given by 

F(x) =P(X< x ) 

= P ( X 2 < x 2 ) 

= P (\ X 2 < ]x 2 
V 4 “4 

x 2 

= PIY< X - 


f{y ) dy 

dy 


Jo 

x 2 

T 


Thus 


= f - 


Hence the probability density function of X is given by 

( % for 0 < x < 2 


/0) = 


0 


otherwise. 
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Graph of the PDF of X 



Example 6.2. If X has a uniform distribution on the interval from 0 to 10, 
then what is P (X + ^ > 7)? 


Answer: Since X ~ UNIF(0, 10), the probability density function of X is 
f(x) = yq for 0 < x < 10. Hence 


P 




P(X 2 + 10 > IX ) 


P (A 2 — 7 X + 10 > 0) 
P((X- 5) (X-2) >0) 
P(X< 2 or A > 5) 

1 -P(2 < A < 5) 


1 j-2 dX 




7 

10 ' 


Example 6.3. If X is uniform on the interval from 0 to 3, what is the 
probability that the quadratic equation 4f 2 + 4 t.X + X + 2 = 0 has real 
solutions? 

Answer: Since X ~ UNIF( 0,3), the probability density function of X is 

( i 0 < x < 3 

f(x) = ' 

^ 0 otherwise. 
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The quadratic equation 4 t 2 + 4tA' + X + 2 = 0 has real solution if the 
discriminant of this equation is positive. That is 

16X 2 - 16(A + 2) > 0, 


which is 

X 2 — A — 2 > 0. 

From this, we get 

(X — 2) (A + l) > 0. 

The probability that the quadratic equation 4 1 2 + 4 tX + X + 2 = 0 has real 
roots is equivalent to 


P ((X - 2) (X + 1) > 0) = P {X < -1 or X > 2) 

= P(X < -1 ) + P(X > 2) 

= / f(x)dx+ [ f(x) dx 


= 0+[ \dx 

J 2 3 

= \= 0.3333. 

o 


Theorem 6.2. If A is a continuous random variable with a strictly increasing 
cumulative distribution function F(x), then the random variable Y, defined 

by 

Y = F{ A) 

has the uniform distribution on the interval [0, 1]. 

Proof: Since F is strictly increasing, the inverse F~ 1 (x) of F(x) exists. We 
want to show that the probability density function g(y) of Y is g(y ) = 1. 
First, we find the cumulative distribution G(y) function of Y. 

G(y) = P(Y <y) 

= P (F(X) < y) 

= P(X<F-\y )) 

= F(F~\y)) 


= y- 
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Hence the probability density function of Y is given by 

g{y) = 4 -G{y) = ^y = l. 

ay dy 


The following problem can be solved using this theorem but we solve it 
without this theorem. 

Example 6.4. If the probability density function of X is 


/ 0 ) = 


— 00 < X < 00 , 


(1 + e~ x ) 2 ' 

then what is the probability density function of Y = 1+[ 1 ._ A - ? 
Answer: The cumulative distribution function of Y is given by 


G(y) = P(Y<y) 

1 


= P 


1 + e 


3v <y 


= P ( 1 + e _x > - 


—x ^ 1 y 


= P e> 


= P -X > In 


= P[X <- In 

n-ln 


-L 


y 

i-y 

y 

i-y 

y 

a —x 


l 


(1 + e ~ x ) 2 

-lni=2 


dx 


1 + e _: 
1 

1 + 

v 


= y- 

Hence, the probability density function of Y is given by 

1 if 0 < y < 1 


f{y) = 


0 otherwise. 
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Example 6.5. A box to be constructed so that its height is 10 inches and 
its base is X inches by X inches. If X has a uniform distribution over the 
interval (2, 8), then what is the expected volume of the box in cubic inches? 

Answer: Since X ~ UNIF{ 2,8), 

fix) = sh = l on ( 2 > 8 )- 

The volume V of the box is 

V = 10 X 2 . 

Hence 

E(V) = E (lOA 2 ) 

= 10 E(X 2 ) 

f 8 nl 

= 10 / x 2 - dx 

J 2 6 

_ 10 r z 3 1 8 
_ "e" [yJ 2 

= IT [ 8 3 — 2 3 ] = (5) (8) (7) = 280 cubic inches. 

18 

Example 6.6. Two numbers are chosen independently and at random from 
the interval (0,1)- What is the probability that the two numbers differs by 
more than ! 2 ? 

Answer: See figure below: 



Choose x from the x-axis between 0 and 1, and choose y from the y-axis 
between 0 and 1. The probability that the two numbers differ by more than 
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\ is equal to the area of the shaded region. Thus 

r(|x-y| > i) = 1 ± 1 = i. 

6.2. Gamma Distribution 

The gamma distribution involves the notion of gamma function. First, 
we develop the notion of gamma function and study some of its well known 
properties. The gamma function, r(z), is a generalization of the notion of 
factorial. The gamma function is defined as 

poo 

r(*):= / x’-'e^dx, 

Jo 

where z is positive real number (that is, 2 > 0). The condition 2 > 0 is 
assumed for the convergence of the integral. Although the integral does not 
converge for 2 < 0, it can be shown by using an alternative definition of 
gamma function that it is defined for all 2 € It \ {0, —1, —2, —3,...}. 

The integral on the right side of the above expression is called Euler’s 
second integral, after the Swiss mathematician Leonhard Euler (1707-1783). 
The graph of the gamma function is shown below. Observe that the zero and 
negative integers correspond to vertical asymptotes of the graph of gamma 
function. 


Graph of the Gamma Function 



Lemma 6.1. F(l) = 1. 

Proof: 

pOO 

F(l) = / x°e~ x dx = [-e~ x ]™ = 1. 

Jo 

Lemma 6.2. The gamma function F(2) satisfies the functional equation 
r(z) = (2 — 1 ) F(2 — 1) for all real number 2 > 1 . 
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Proof: Let z be a real number such that z > 1, and consider 

pOO 

= / x 2 - 1 e~ x dx 

Jo 

pOO 

= [^x 2 " 1 e ~ x \!° + / (z- 1) x*“ 2 e~ x dx 

Jo 

poo 

= (z — 1) / x z ~ 2 e~ x dx 
Jo 

= {z-l)T{z-l). 

Although, we have proved this lemma for all real z > 1, actually this 
lemma holds also for all real number z £ 1R\ {1,0, — 1, — 2,— 3,...}. 

Lemma 6.3. r (|) = 

Proof: We want to show that 


r UJ Jo ^ dX 

is equal to \fn. We substitute y = y/x, hence the above integral becomes 
„ /1\ r°° e~ x , 


Hence 


and also 


= 2 / e v dy, where y = y/x. 

Jo 


r (-2 - 2 L 


Multiplying the above two expressions, we get 

( r (0) -ff —- 

Now we change the integral into polar form by the transformation u = 
rcos(0) and v = rsin(0). The Jacobian of the transformation is 


J(r, 6) = det 


' du 

du 

dr 

dO 

dv 

dv 

. dr 

dO 

COS 

(0) 

sin 

{0) 


= ?’cos 2 ( 0 ) + rsin 2 ( 0 ) 
= r. 
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Hence, we get 


e r J(r,6)drdd 


e r rdrdd 


e~ r 2 rdrdO 


e~ r dr 2 d9 


= 2 T{l)dd 

Jo 


Therefore, we get 


r U =*■ 


Lemma 6.4. L (— = — 2^/tt. 

Proof: By Lemma 6.2, we get 


r 0 ) = (* -1) r (z -1) 


for all z G R \ {1, 0, —1, —2, —3,...}. Letting z = |, we get 




which is 


r B)=- 2 r u)=- 2 ^ 


Example 6.7. Evaluate F (|). 


Answer: 


r(-) = --r( 1 - 
V 2 / 2 2 \ 2 


= 4 V- 


Example 6.8. Evaluate T (— |). 
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Answer: Consider 



Example 6.9. Evaluate E (7.8). 

Answer: 

r (7.8) = (6.8) (5.8) (4.8) (3.8) (2.8) (1.8) T (1.8) 

= (3625.7) T (1.8) 

= (3625.7) (0.9314) = 3376.9. 

Here we have used the gamma table to find E (1.8) to be 0.9314. 

Example 6.10. If n is a natural number, then T(n + 1) = n\. 

Answer: 

E(?r + 1) = nT(n) 

= n (n — 1) E(n — 1) 

= n (n — 1) (n — 2) E(n — 2) 

= n(n — 1) (n — 2) •••(l)E(l) 

= n\ 

Now we are ready to define the gamma distribution. 

Definition 6.2. A continuous random variable X is said to have a gamma 
distribution if its probability density function is given by 

{ r , \ a<* x a ~ x if 0 < x < oo 

0 otherwise, 

where a > 0 and 9 > 0. We denote a random variable with gamma distri¬ 
bution as X ~ GAM(6, a). The following diagram shows the graph of the 
gamma density for various values of values of the parameters 9 and a. 
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The following theorem gives the expected value, the variance, and the 
moment generating function of the gamma random variable 

Theorem 6.3. If X ~ GAM(6,a), then 
E(X) =0a 

Var(X) =0 2 a 


M(t) = 


1 


1 -0t 


if t<-. 


Proof: First, we derive the moment generating function of X and then we 
compute the mean and variance of it. The moment generating function 


M(t) = E (e tx ) 


T(a) 0 ° 


x a 1 e « e tx dx 


o r(a)0' 
1 


1 “- 1 e -Hl- St )x dx 


y a 1 e v dy , 


0 r(a) o a (i - ety 

/»oo 


1 


1 


y a 1 e v dy 


(i - ety J 0 r(a) 

i 

(i - ety' 


where y = - (1 — 9t) x 
0 


since the integrand is GAM(l,a). 
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The first derivative of the moment generating function is 

= (-«)(! -ot)- a ~\-o) 

= a6(l- et)- {a+1) . 

Hence from above, we find the expected value of X to be 

E(X)=M'(0) = a6. 


Similarly, 

M"(t ) = j [a6 (l - et)- {a+1) ^j 

~ ad (a + 1) 9 (1 — 0t) _(a+2) 

= a (a + 1) e 2 (1 - et)~ (a+2) . 

Thus, the variance of X is 

Var(X) = M"{ 0) - (M'(0)) 2 
= a {a + 1) 6 2 — a 2 9 2 
= a 9 2 

and proof of the theorem is now complete 

In figure below the graphs of moment generating function for various 
values of the parameters are illustrated. 



Example 6.11. Let X have the density function 


r(ce) e a 


if 0 < x < oo 


f(x) = 


0 


otherwise, 
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where a > 0 and 9 > 0. If a = 4, what is the mean of ^ 3 ? 

Answer: 


r 00 1 

E ( X ~ 3 )= 

Jo x 


1 


/0 r(4) 0 4 

e~^ dx 


x 3 e 9 dx 


3! 9 A 


1 r °° 1 

3!# y 0 e 

1 


3!6 > 3 


e 9 dx 

since the integrand is GAM(0,1). 


Definition 6.3. A continuous random variable is said to be an exponential 
random variable with parameter 9 if its probability density function is of the 
form 

fie - ? if x > 0 

/O) = i 

[ 0 otherwise, 

where 9 > 0. If a random variable X has an exponential density function 
with parameter 9, then we denote it by writing X ~ EXP(9). 


Exponential Distributions 



An exponential distribution is a special case of the gamma distribution. 
If the parameter a = 1, then the gamma distribution reduces to the expo¬ 
nential distribution. Hence most of the information about an exponential 
distribution can be obtained from the gamma distribution. 

Example 6.12. What is the cumulative density function of a random vari¬ 
able which has an exponential distribution with variance 25? 
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Answer: Since an exponential distribution is a special case of the gamma 
distribution with a = 1, from Theorem 6.3, we get Var(X) = 9 2 . But this 
is given to be 25. Thus, 9 2 = 25 or 9 = 5. Hence, the probability density 
function of X is 


F{x)= f(t)dt 


1 , 
- e 5 dt 
o 5 


_ 1 
“ 5 L 
= 1 — e 


—5 e 


X 

0 



Example 6.13. If the random variable X has a gamma distribution with 
parameters a = 1 and 9=1, then what is the probability that X is between 
its mean and median? 

Answer: Since X ~ GAM( 1,1), the probability density function of X is 

( e~ x if x > 0 

/ 0 ) = \ 

0 otherwise. 

Hence, the median q of X can be calculated from 

1 f q 

-= e~ x dx 

2 Jo 

= [-e- X ]l 

= l-e~ q . 


2 = 1 ~ 6 9 


Hence 
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and from this, we get 


q = In 2. 


The mean of X can be found from the Theorem 6.3. 


E{X) = a6 = 1. 


Hence the mean of X is 1 and the median of X is In 2. Thus 


P(ln2 < X <1) = [ e~ x dx 
J In 2 


= L“ e 

In 2 


l 1 

Jin 2 
1 


= e 


1 1 

2 “ e 
e — 2 

2 e ' 


Example 6.14. If the random variable X has a gamma distribution with 
parameters a = 1 and 0 = 2, then what is the probability density function 
of the random variable Y = e A ? 

Answer: First, we calculate the cumulative distribution function G(y) of Y. 


G(y) = P(Y <y) 

= P(e x <y) 

= P(X < In y) 
/•In y l 

= 1 x iix 

= 5 [-^t” 


= 1 - 
= 1 - 


1 


e h ln y 

1 


Vv' 


Hence, the probability density function of Y is given by 


^ = Ty °- Ty 



1 

2 y^y' 
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Thus, if X ~ GAM(l,2), then probability density function of e x is 


f(x) 



if 1 < x < oo 


0 otherwise. 



Definition 6.4. A continuous random variable X is said to have a chi-square 
distribution with r degrees of freedom if its probability density function is of 
the form 


f(x) 


r (i)' 


■ £2 1 e 2 


if 0 < x < oo 


otherwise, 


where r > 0. If A' has a chi-square distribution, then we denote it by writing 
A ~ x 2 (r). 



The gamma distribution reduces to the chi-square distribution if a = | and 
9 = 2. Thus, the chi-square distribution is a special case of the gamma 
distribution. Further, if r —> oo, then the chi-square distribution tends to 
the normal distribution. 
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The chi-square distribution was originated in the works of British Statis¬ 
tician Karl Pearson (1857-1936) but it was originally discovered by German 
physicist F. R. Helmert (1843-1917). 

Example 6.15. If X ~ GAM(1,1), then what is the probability density 
function of the random variable 2X1 

Answer: We will use the moment generating method to find the distribution 
of 2X. The moment generating function of a gamma random variable is given 
by (see Theorem 6.3) 

M(t) = (1 — 9t)~ a , if t<\. 

V 

Since X ~ GAM{ 1,1), the moment generating function of X is given by 

M x (t) = — l —, t<i. 

Hence, the moment generating function of 2X is 

M‘2x (t) = M x (2 1) 

1 

“ 1 - 2 1 
1 

“ (1-2 t)i 

= MGF of x 2 (2). 

Hence, if X is an exponential with parameter 1, then 2X is chi-square with 
2 degrees of freedom. 

Example 6.16. If X ~ % 2 (5), then what is the probability that X is between 
1.145 and 12.83? 


Answer: The probability of X between 1.145 and 12.83 can be calculated 
from the following: 


P(1.145 < X < 12.83) 

= P{X < 12.83) - P(X < 1.145) 


f*12.83 


f 1.145 


r-12.83 


f{x) dx - 

1 


F(|) 2f 


x 2 


f(x) dx 


e 2 dx — 


r-1.145 


F(|) 2§ 


x 2 1 e 2 dx 


= 0.975 — 0.050 (from % 2 table) 


= 0.925. 
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These integrals are hard to evaluate and so their values are taken from the 
chi-square table. 

Example 6.17. If X ~ X 2 (7), then what are values of the constants a and 
b such that P{a < X <b) = 0.95? 


Answer: Since 


we get 


0.95 = P(a < X < b) = P(X <b)- P(X < a), 


P(X < 6) = 0.95 + P(X < a). 


We choose a = 1.690, so that 


From this, we get 


P(X < 1.690) = 0.025. 


P(X <b)= 0.95 + 0.025 = 0.975 


Thus, from the chi-square table, we get b = 16.01. 

Definition 6.5. A continuous random variable X is said to have a n-Erlang 
distribution if its probability density function is of the form 


/O) = 


\ p —Ax (Ax)" 1 
(n-1)! ’ 


if 0 < x < oo 


otherwise, 


where A > 0 is a parameter. 

The gamma distribution reduces to n-Erlang distribution if a = n, where 
n is a positive integer, and 6 = y. The gamma distribution can be generalized 
to include the Weibull distribution. We call this generalized distribution the 
unified distribution. The form of this distribution is the following: 


f{x) = \ o^lW+i) 


if 0 < x < oo 


otherwise, 


where 9 > 0, a > 0, and if; € {0,1} are parameters. 
If ip = 0, the unified distribution reduces 


/ 0) = 


i e * , if 0 < x < oo 


otherwise 
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which is known as the Weibull distribution. For a = 1, the Weibull distribu¬ 
tion becomes an exponential distribution. The Weibull distribution provides 
probabilistic models for life-length data of components or systems. The mean 
and variance of the Weibull distribution are given by 


E(X) = 6° T 



Var(X) = ei 






From this Weibull distribution, one can get the Rayleigh distribution by 
taking 6 = 2 a 2 and a = 2. The Rayleigh distribution is given by 

( if 0 < x < oo 

ft*) = y 

\ 0 otherwise. 

If if) = 1, the unified distribution reduces to the gamma distribution. 


6.3. Beta Distribution 


The beta distribution is one of the basic distributions in statistics. It 
has many applications in classical as well as Bayesian statistics. It is a ver¬ 
satile distribution and as such it is used in modeling the behavior of random 
variables that are positive but bounded in possible values. Proportions and 
percentages fall in this category. 

The beta distribution involves the notion of beta function. First we 
explain the notion of the beta integral and some of its simple properties. Let 
a and f3 be any two positive real numbers. The beta function B(a, (3) is 
defined as 

B(a,(3) = f a:“ _1 (l — x) l3 ~ 1 dx. 

Jo 

First, we prove a theorem that establishes the connection between the 
beta function and the gamma function. 

Theorem 6.4. Let a and (3 be any two positive real numbers. Then 


B(a,0) 


T(a)T(0) 

r(a + /?) 



where 


F(*) = 


x z - x e~ x dx 
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is the gamma function. 

Proof: We prove this theorem by computing 


T{a)T(0)=(J x a - x e- x d^\ (J y^e^dy 

= (/ u 2a ~ 2 e~ u 2udu \ (/ v 2/3 ^ 2 e~ v 2 vdv\ 


p OO pOO 


= 4 


to Jo 

- 5 /*oo 


u 2o:_ 1 1 e _ (“ + ’' ) dudu 

/* ^ r oo 

4 / / r 2a:+2 ^ _2 (cos^) 2a_1 (sin^) 2/3_1 e _r2 rdr^ 

7o Vo 

= (^°°(r 2 ) a+ ^- 1 e- r2 dr 2 ) (cosd) 2 ^ 1 (sin Of^dB 

= r(a + /3) ^2 jf 2 (cos 6») 2 “" 1 (sin d) 2 ^" 1 ^ 

= r(a + /3) [ t 01 - 1 ^ -tf- x dt 
Jo 

= r(a + /3) B(a, (J). 


The second line in the above integral is obtained by substituting x = u 2 and 
y = v 2 . Similarly, the fourth and seventh lines are obtained by substituting 
u = r cos 9, v = rsind, and t = cos 2 9 , respectively. This proves the theorem. 
The following two corollaries are consequences of the last theorem. 

Corollary 6.1. For every positive a and 0, the beta function is symmetric, 
that is 

B{a,0) = B(0,a). 


Corollary 6.2. For every positive a and 0, the beta function can be written 
as 

B(a,0) = 2 f 2 (cos 0) 2a-1 (sin 8) 2 ^~ 1 d9. 

Jo 

The following corollary is obtained substituting s = in the definition 
of the beta function. 


Corollary 6.3. 

pressed as 


For every positive a and 0, the beta function can be ex- 


B{a,0) 



s 


a— 


1 


ds. 


(1 + s) a +P 
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Using Theorem 6.4 and the property of gamma function, we have the 
following corollary. 

Corollary 6.4. For every positive real number j3 and every positive integer 
a, the beta function reduces to 

B( /D\ = _ (<* - x ) ! _ 

[ ,P> (a — 1 + /3)(a — 2 + /3) • • • (1 + /3)f3' 


Corollary 6.5. For every pair of positive integers a and (3, the beta function 
satisfies the following recursive relation 


B(a,0) 


(a- l)(/?-l) 

(ct + (3 — 1) (ct + f3 — 2) 


B(a 


l,/3-l). 


Definition 6.6. A random variable X is said to have the beta density 
function if its probability density function is of the form 

f E^* a-1 ( 1 -*) /3-1 > if 0 < £ < 1 

/(*) = 

0 otherwise 


for every positive a and f3. If X has a beta distribution, then we symbolically 
denote this by writing X ~ BETA(a, (3). 

The following figure illustrates the graph of the beta distribution for 
various values of a and (3. 


Beta Distributions 



The beta distribution reduces to the uniform distribution over (0,1) if 
a = 1 = (3. The following theorem gives the mean and variance of the beta 
distribution. 
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Theorem 6.5. If X ~ BETA(a, (3), 


E(X) = 


a 

ex T (3 


Var(X) = 


a(3 

(a + (3) 2 (a + /? + 1) 


Proof: The expected value of X is given by 


E(X) = [ x /(x) dx 

Jo 


= / x“(1 

f?(a,/3) ,/ 0 1 

-B(o + 1 , /3) 


- x)^ 1 dx 


r(a+l)r(/3) T(a + /3) 
r(a + /3+l) r(a)r(/3) 
ar(a)T(/3) r(a + ( 3 ) 

{a +/3)T(a +(3) T{a)T{/3) 


a 

(x T [3 


Similarly, we can show that 


E (. X 2 ) 


o (o T 1) 

(o + /3 + 1) {ex + /3) 


Therefore 


Var(X) = E (X 2 ) - E(X) 


a(3 

(o; + /3) 2 (ct + (3 + 1) 


and the proof of the theorem is now complete. 

Example 6.18. The percentage of impurities per batch in a certain chemical 
product is a random variable X that follows the beta distribution given by 

f 60 x 3 (1 — x) 2 for 0 < x < 1 

fix) = l 

y 0 otherwise. 


What is the probability that a randomly selected batch will have more than 
25% impurities? 
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Proof: The probability that a randomly selected batch will have more than 
25% impurities is given by 


P(X > 0.25) = 



x) 2 dx 


60 

60 

60 


J 0.25 
4 


L4 

657 

40960 


x 3 — 2a; 4 + x 5 ) dx 

2x 5 x 6 1 1 

~ 5 ~ + ~6 
0 0 J 0.25 

= 0.9624. 


Example 6.19. The proportion of time per day that all checkout counters 
in a supermarket are busy follows a distribution 

f kx 2 (1 — a:) 9 for 0 < x < 1 

f(x) = l 

y 0 otherwise. 

What is the value of the constant k so that f(x) is a valid probability density 
function? 

Proof: Using the definition of the beta function, we get that 

x 2 (1 — a:) 9 dx = B{ 3,10). 

Hence by Theorem 6.4, we obtain 


£(3,10) 


r(3)T(10) 

T(13) 


1 

660' 


Hence k should be equal to 660. 

The beta distribution can be generalized to any bounded interval [a, 6]. 
This generalized distribution is called the generalized beta distribution. If 
a random variable X has this generalized beta distribution we denote it by 
writing X ~ GBETA(a, 0,a,b). The probability density of the generalized 
beta distribution is given by 


1 (. x-a )°~ 1 ( b-x U" 1 

B(a,/3 ) (b—a) a +f 3 ~ 1 


if a < x < b 


f(x) = 


0 


otherwise 
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where a, f3, a > 0. 

If X ~ GBETA(a, (3, a, b), then 

E(X) = (b-a)-^-+a 
a + p 

Var(X) = (b — a) 2 - ^ —3—tt- 

(a T /3) (a —I— 1) 

It can be shown that if X = {b — a)Y + a and Y ~ BETA(a, /3), then 
X ~ GBET A(a, (3,a,b). Thus using Theorem 6.5, we get 

E(X) = E((b - a)Y + a) = (b- a)E(Y) + a = (b - a) + a 

a + p 

and 

Var(X) = Var((b—a)Y+a) = ( b-a) 2 Var{Y) = (■ b-a ) 2 , ^ 2 ^, iv 

(a + + f) + 1) 

6.4. Normal Distribution 

Among continuous probability distributions, the normal distribution is 
very well known since it arises in many applications. Normal distribution 
was discovered by a French mathematician Abraham DeMoivre (1667-1754). 
DeMoivre wrote two important books. One is called the Annuities Upon 
Lives, the first book on actuarial sciences and the second book is called the 
Doctrine of Chances, one of the early books on the probability theory. Pierre- 
Simon Laplace (1749-1827) applied normal distribution to astronomy. Carl 
Friedrich Gauss (1777-1855) used normal distribution in his studies of prob¬ 
lems in physics and astronomy. Adolphe Quetelet (1796-1874) demonstrated 
that man’s physical traits (such as height, chest expansion, weight etc.) as 
well as social traits follow normal distribution. The main importance of nor¬ 
mal distribution lies on the central limit theorem which says that the sample 
mean has a normal distribution if the sample size is large. 

Definition 6.7. A random variable X is said to have a normal distribution 
if its probability density function is given by 

r ,\ 1 _ I (^ZR) 2 

j(x) = — -j= e u » ' , — oo < x < oo, 

a \/2 7r 


where — oo < p < oo and 0 < er 2 < oo are arbitrary parameters. If X has a 
normal distribution with parameters p and <r 2 , then we write X ~ N(p,,a 2 ). 
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PDF of Normal Random Variables 



Example 6.20. Is the real valued function defined by 

, 1 _ 1 ( | 2 

fix) = -== e 2 v > , —oo < x < oo 

V ' ay/ 2¥ 

a probability density function of some random variable XI 

Answer: To answer this question, we must check that / is nonnegative 
and it integrates to 1. The nonnegative part is trivial since the exponential 
function is always positive. Hence using property of the gamma function, we 
show that / integrates to 1 on IL 

r°° l _i / i-ti 

fix) dx = / -== e 2 v ^ ' dx 

.Loo aV 2¥ 



= 2 






dx 



a 

V2z 


dz , 


where z 






~^= e~ z dz 

Vz 





= 1 . 


The following theorem tells us that the parameter /i is the mean and the 
parameter a 2 is the variance of the normal distribution. 
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Theorem 6.6. If X ~ 7V(/i,er 2 ), then 

E{X) = n 
Var(X) = a 2 

M(t) = e nt+h° 2 t 2 . 


Proof: We prove this theorem by first computing the moment generating 
function and finding out the mean and variance of X from it. 


M(t) = E (e tx ) 


e tx f(x) dx 


— OO 

oo 


1 


ay/2 7r 


;= e 2 ( dx 


-i: 


tx 1_ -^(x 2 -2 !J,x+fi 2 ) 


e - 1 = e 2 , 

, a y/2n 


dx 


/ 


1 -— (x 2 — 2nx+[i 2 — 2a 2 tx) 


— oo U - 


\Z2tt 


e 2 o- 


dx 


■V2n 


(x-[i-a 2 t) 2 Ait+irr 2 t 2 


e 2 , 


dx 


_ gp T 2 1 


' — OO ^ 


-v/2tt 


e 2 c 


(x—fi—a 2 t/ 


dx 


= e Ht+l<T 2 t 2 ' 

The last integral integrates to 1 because the integrand is the probability 
density function of a normal random variable whose mean is /i + a 2 t and 
variance cr 2 , that is IV(/z + a 2 t 1 a 2 ). Finally, from the moment generating 
function one determines the mean and variance of the normal distribution. 
We leave this part to the reader. 

Example 6.21. If X is any random variable with mean /i and variance cr 2 > 
0, then what are the mean and variance of the random variable Y = A '~ 7 ' ? 

7 G 
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Answer: The mean of the random variable Y is 

= -E{X-ii) 

(7 

= - ( E(X) - /z) 
a 

= - (m - m) 

a 

= 0 . 

The variance of Y is given by 

Var(Y) = Var ^ ——— 

= \ Var (X - n) 

cr z 

= - Var(X ) 

(7 



= 1. 


Hence, if we define a new random variable by taking a random variable and 
subtracting its mean from it and then dividing the resulting by its stan¬ 
dard deviation, then this new random variable will have zero mean and unit 
variance. 

Definition 6.8. A normal random variable is said to be standard normal, if 
its mean is zero and variance is one. We denote a standard normal random 
variable X by X ~ iV(0,1). 

The probability density function of standard normal distribution is the 
following: 

1 _ xi 

f(x) = _ e 2 , —oo < x < oo. 

v2 7T 

Example 6.22. If X ~ A(0,1), what is the probability of the random 
variable X less than or equal to —1.72? 

Answer: 

P(X < -1.72) = 1 - P{X < 1.72) 

= 1 — 0.9573 (from table) 

= 0.0427. 



Probability and Mathematical Statistics 


171 


Example 6.23. If Z ~ 7V(0,1), what is the value of the constant c such 
that P (\Z\ < c) = 0.95? 

Answer: 

0.95 = P(\Z\ < c) 

= P {-c < Z < c) 

= P (Z < c) — P (Z < —c) 

= 2P(Z < c) - 1 . 

Hence 

P(Z < c) = 0.975, 
and from this using the table we get 

c= 1.96. 


The following theorem is very important and allows us to find probabil¬ 
ities by using the standard normal table. 

Theorem 6.7. If X ~ N{fjL,a 2 ), then the random variable Z = FxzlL ^ 
N( 0,1). 

Proof: We will show that Z is standard normal by finding the probability 
density function of Z. We compute the probability density of Z by cumulative 
distribution function method. 


Hence 


F(z) = P{Z < z) 

=P ( X ~» 


< z 


= P (X < a z + n) 


-L 


—oo cr \/2n 

1 


dx 


— OO & 


V2^ 


ere”3"’ dw, 


where w = 


x — [i 


f(z)=F'(z) = ^e~i z ~ 


1 

\/2V 


The following example illustrates how to use standard normal table to 
find probability for normal random variables. 


Example 6.24. If X ~ N( 3,16), then what is P(4 < X < 8)? 
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Answer: 


P ( 4 < A < 8 ) = P (^<^< 8 - 3 


= P[ 1 < Z < 5 
4 “ “4 


= P (Z < 1.25) ~P(Z < 0.25) 
= 0.8944 - 0.5987 
= 0.2957. 


Example 6.25. If X ~ 7V(25,36), then what is the value of the constant c 
such that P (\X — 25| < c) = 0.9544? 


Answer: 


Hence 


0.9544 = P(|A —25| < c) 

= P(-c< A- 25 < c) 

, c X — 25 c 

= P — < -< - 

6 “ 6 “ 6 


= p( z <£)_p( z <_S 

= 2P( Z <£)-L 


P < 0 = 0.9772 

and from this, using the normal table, we get 

c= 12. 




or 


The following theorem can be proved similar to Theorem 6.7. 

Theorem 6.8. If X ~ iV(/z, ct 2 ), then the random variable ~ X 2 (l)- 

Proof: Let W = and Z = A . We will show that the random 

variable W is chi-square with 1 degree of freedom. This amounts to showing 
that the probability density function of W to be 


9 M = 


——- e~ 2 W 


/2n w 


if 0 < w < oo 
otherwise . 
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We compute the probability density function of W by distribution function 
method. Let G(w) be the cumulative distribution function W, which is 


G(w) = P(W <w) 

' 'X-/P 2 
a 


= P 


< w 


= P[-Vvj<^-^ < 


= P (—y/w < Z < y/w) 
= / f(z) dz, 



where f(z ) denotes the probability density function of the standard normal 
random variable Z. Thus, the probability density function of W is given by 


9 O) 


d 

dw 


f(z ) dz 


— \/w 


1 

e 

Vzk 

i 

y/ZTTW 


1 1 
- 1 —— “I” - . < 

2 yfw yfht 


1 

2 yfw 


Thus, we have shown that W is chi-square with one degree of freedom and 
the proof is now complete. 

Example 6.26. If X ~ N( 7,4), what is P (15.364 < (X - 7) 2 < 20.095)? 
Answer: Since X ~ iV(7,4), we get fi = 7 and c = 2. Thus 


P (15.364 < (X - 7) 2 < 20.095) 

15.364 ^ /X — 7\ 2 ^ 20.095 
4 - ^ 2 J - 4 

= P (3.841 < Z 2 < 5.024) 

= P (0 < Z 2 < 5.024) - P (0 < Z 2 < 3.841) 
= 0.975 - 0.949 
= 0.026. 
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A generalization of the normal distribution is the following: 


9(x) = 


vip{v) 

2aT{l/v) 


where 




r(i M 


and v and a are real positive constants and — oo < < oo is a real con¬ 

stant. The constant p, represents the mean and the constant cr represents 
the standard deviation of the generalized normal distribution. If v = 2, then 
generalized normal distribution reduces to the normal distribution. If v = 1, 
then the generalized normal distribution reduces to the Laplace distribution 
whose density function is given by 


r/ , 1 « 
f(x) = 2S e ^ 

where 6 = The generalized normal distribution is very useful in signal 
processing and in particular modeling of the discrete cosine transform (DCT) 
coefficients of a digital image. 


6.5. Lognormal Distribution 

The study lognormal distribution was initiated by Galton and McAlister 
in 1879. They came across this distribution while studying the use of the 
geometric mean as an estimate of location. Later, Kapteyn (1903) discussed 
the genesis of this distribution. This distribution can be defined as the distri¬ 
bution of a random variable whose logarithm is normally distributed. Often 
the size distribution of organisms, the distribution of species, the distribu¬ 
tion of the number of persons in a census occupation class, the distribution of 
stars in the universe, and the distribution of the size of incomes are modeled 
by lognormal distributions. The lognormal distribution is used in biology, 
astronomy, economics, pharmacology and engineering. This distribution is 
sometimes known as the Galton-McAlister distribution. In economics, the 
lognormal distribution is called the Cobb-Douglas distribution. 

Definition 6.10. A random variable X is said to have a lognormal distri¬ 
bution if its probability density function is given by 

i —if Mglug t 2 

- 7 = e 2 '' " ' , if 0 < a: < oo 

x a \J Ik 


fix) = 


0 


otherwise , 
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where —oo < /i < oo and 0 < a 2 < oo are arbitrary parameters. 

If X has a lognormal distribution with parameters [i and cr 2 , then we 
write X ~ /^(/z,<r 2 ). 


Lognormal Distributions with /j=0 



Example 6.27. If X ~ f\(n,cr 2 ), what is the 100p th percentile of XI 

Answer: Let q be the 100p th percentile of X. Then by definition of per¬ 
centile, we get 


Substituting z 


P = 

a 


r q l 

,/o x a \/2 tt 
in the above integral, we have 


1 / ln(x) — p. \ 


dx. 


ln(g) — p. 


P = 


— OO 

[Z V x 


1 _ I z 2 , 

e 2 dz 


l—OO 


e 2 21 dz, 


where z p = - is the 100p th of the standard normal random variable. 

Hence 100p th percentile of X is 


q = e aZp+fl , 


where z p is the 100p th percentile of the standard normal random variable Z. 
Theorem 6.9. If X ~ /^(^,,er 2 ), then 


E(X) = e m+2<t2 


Var(X) = 



g2/r+(j 2 
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Proof: Let t be a positive integer. We compute the f th moment of X. 


/»00 

E(X t )= x*f{x) 
Jo 


dx 


1 


1 j InM-^iN 2 


dx. 


Jo x a \Z2 tt 
Substituting z = ln(a:) in the last integral, we get 

E(X t )= f e tz dz = M z (t), 

J —oo o' v 2 7r 

where Mz(t) denotes the moment generating function of the random variable 
Z ~ N(li,<j 2 ). Therefore, 


M z (t) = e^+s 0 ’ 2 * 2 


Thus letting t = 1, we get 

E(X) = e /i+ 5°' 2 . 
Similarly, taking t = 2, we have 

E(X 2 ) = e 2ti+2a2 . 


Thus, we have 


Var{X) = E{X 2 ) - E{X) 2 = 



^2 fj,+a~ 


and now the proof of the theorem is complete. 

Example 6.28. If X ~ /^(0,4), then what is the probability that X is 
between 1 and 12.1825? 

Answer: Since X ~ /^(0,4), the random variable Y = ln(A) ~ A(0,4). 
Hence 


P(1 < X < 12.1825) = P(ln(l) < ln(A) < ln(12.1825)) 
= P(0 <Y< 2.50) 

= P(0 < Z < 1.25) 

= P{Z < 1.25) - P(Z < 0) 

= 0.8944- 0.5000 
= 0.4944. 




6.6. Inverse Gaussian Distribution 

If a sufficiently small macroscopic particle is suspended in a fluid that is 
in thermal equilibrium, the particle will move about erratically in response 
to natural collisional bombardments by the individual molecules of the fluid. 
This erratic motion is called “Brownian motion” after the botanist Robert 
Brown (1773-1858) who first observed this erratic motion in 1828. Inde¬ 
pendently, Einstein (1905) and Smoluchowski (1906) gave the mathematical 
description of Brownian motion. The distribution of the first passage time 
in Brownian motion is the inverse Gaussian distribution. This distribution 
was systematically studied by Tweedie in 1945. The interpurchase times of 
toothpaste of a family, the duration of labor strikes in a geographical region, 
word frequency in a language, conversion time for convertible bonds, length 
of employee service, and crop field size follow inverse Gaussian distribution. 
Inverse Gaussian distribution is very useful for analysis of certain skewed 
data. 
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Definition 6.10. A random variable X is said to have an inverse Gaussian 
distribution if its probability density function is given by 


f(x) 



if 0 < x < oo 


0 


otherwise, 


where 0 < /i < oo and 0 < A < oo are arbitrary parameters. 

If X has an inverse Gaussian distribution with parameters /i and A, then 
we write X ~ /G(/u, A). 


Inverse Normal Distributions with A=1 




The characteristic function <j>(t) of X ~ JG(/r, A) is 


<p(t) = E (e itx ) 


X 

A 4 


1 - 




2 ifj.21 

A 


= e 
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Using this, we have the following theorem. 
Theorem 6.10. If X ~ IG(fi,X), then 

E{X) = n 
Var(X) = 

A 


Proof: Since </>(t) = E (e* tx ), the derivative (f>'{t) = iE ( Xe ltx ). Therefore 
4>'{0) = iE(X). We know the characteristic function </>(£) of X ~ IG(fi,X) 


is 


</>W = 




Differentiating ^(t) with respect to t, we have 


m = 


dt 


= e 


= i/ie 


1- 

-V^ 


jd / A 

dt \ fi 


1 - \ 1 - 


2 in 2 t 


1 - 




2 +\ “2 


Hence ^'(0) = i /r. Therefore, E(X) = fi. Similarly, one can show that 


Var(X) = 

A 

This completes the proof of the theorem. 

The distribution function F(x) of the inverse Gaussian random variable 
X with parameters ji and A was computed by Shuster (1968) as 


F( x) = $ 



x 



2A 

+ e $ 




where $ is the distribution function of the standard normal distribution 
function. 


6.7. Logistics Distribution 

The logistic distribution is often considered as an alternative to the uni¬ 
variate normal distribution. The logistic distribution has a shape very close 
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to that of a normal distribution but has heavier tails than the normal. The 
logistic distribution is used in modeling demographic data. It is also used as 
an alternative to the Weibull distribution in life-testing. 

Definition 6.11. A random variable X is said to have a logistic distribution 
if its probability density function is given by 


fix) 


_ZE_ ( X ~V \ 

V3 v ) 


e v's 


V3 


1 + e ^3 


_IL. ( x /f \ 

s/5 \ <r > 


— 00 < X < 00 , 


where — oo < /i < oo and a > 0 are parameters. 


Logistic Distributions with ju=5 



If X has a logistic distribution with parameters /i and <r, then we write 
X ~LOG(li, a). 


Theorem 6.11. If X ~ LOG(/j,, A), then 


E(X) = \x 
Var(X) = a 2 

M(t) = e^ T 



r i 



1*1 < 


7r 

<7 \/3 


Proof: First, we derive the moment generating function of X and then we 
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compute the mean and variance of it. The moment generating function is 


/ oo 

e tx f{x) 

-oo 

/ o° _ 

e te — 

-oo 


7 r e X3 




V3 N 




+ e vs 


= e Mt f e sw --—^ dw, where w = ^ ^ and s = t 

J- oo (1 + e-™) 2 V3cr 7T 


(1 + e~ w Y 


= e' J/t j (z 1 — l) dz, where z = 
Jo 


= ef* / z a {l-z)~ s dz 
Jo 

= e^ B(l + s, 1 - s) 

_ t T(l + s)T(l-s) 

' r(i + s + i-s) 

_ t T(l + s)T(l- s ) 

r( 2 ) 

= e^r(i + s)r(i-s) 

= e* t r(l + —at) r ( 1 - — at 


= cosJ^JLA. 


We leave the rest of the proof to the reader. 


6.8. Review Exercises 


1. If Y ~ UNIF( 0,1), then what is the probability density function of 
X = -lnE? 


2 . Let the probability density function of X be 


/O) = { 


e x if x > 0 

0 otherwise . 


Let Y = 1 — e A . Find the distribution of Y. 
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3. After a certain time the weight W of crystals formed is given approxi¬ 
mately by W = e x where A ~ N(fi,a 2 ). What is the probability density 
function of W for 0 < w < oo ? 

4. What is the probability that a normal random variable with mean 6 and 
standard deviation 3 will fall between 5.7 and 7.5 ? 

5. Let X have a distribution with the 75 th percentile equal to ^ and proba¬ 
bility density function equal to 

{ A e~ Xx for 0 < x < oo 

0 otherwise. 

What is the value of the parameter A ? 

6 . If a normal distribution with mean /i and variance a 2 > 0 has 46 th 
percentile equal to 20cr, then what is /i in term of standard deviation? 

7. Let A be a random variable with cumulative distribution function 

( 0 if x < 0 

F(x) = 

11 — e x if x > 0. 

What is P (0 < e x < 4)? 

8. Let X have the density function 

, , f uSrS) xa ~ 1 (1 ~ a; ^~ 1 for 0 < x < 1 
1 0 otherwise, 

where a > 0 and (3 > 0. If (3 = 6 and a = 5, what is the mean of the random 
variable (1 — A) -1 ? 

9. R.A. Fisher proved that when n > 30 and Y has a chi-square distribution 
with n degrees freedom, then y/2Y — y/2n — 1 has an approximate standard 
normal distribution. Under this approximation, what is the 90 th percentile 
of Y when n = 41 ? 

10 . Let Y have a chi-square distribution with 32 degrees of freedom so that 
its variance is 64. li P (Y > c) = 0.0668, then what is the approximate value 
of the constant c? 

11. If in a certain normal distribution of A, the probability is 0.5 that A is 
less than 500 and 0.0227 that A is greater than 650. What is the standard 
deviation of A? 
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12. If X ~ iV(5,4), then what is the probability that 8 < Y < 13 where 
Y = 2X + 1? 


13. Given the probability density function of a random variable X as 

>e~ ffx if re > 0 

/ 0 ) = 


[ 0 otherwise, 

what is the n th moment of X about the origin? 

14. If the random variable X is normal with mean 1 and standard deviation 
2, then what is P ( X 2 -2X<8)1 

15. Suppose X has a standard normal distribution and Y = e x . What is 
the k th moment of Y1 


16. If the random variable X has uniform distribution on the interval [0,a], 
what is the probability that the random variable greater than its square, that 
is P(X > X 2 )? 

17. If the random variable Y has a chi-square distribution with 54 degrees 
of freedom, then what is the approximate 84 th percentile of Y? 

18. Let X be a continuous random variable with density function 


f(x) = 



for 1 < x < 2 
elsewhere. 


If Y = y/X, what is the density function for Y where nonzero? 

19. If X is normal with mean 0 and variance 4, then what is the probability 
of the event X — £ > 0, that is P (X — ^ > 0)? 

20 . If the waiting time at Rally’s drive-in-window is normally distributed 
with mean 13 minutes and standard deviation 2 minutes, then what percent¬ 
age of customers wait longer than 10 minutes but less than 15 minutes? 

21. If X is uniform on the interval from —5 to 5, what is the probability that 
the quadratic equation 100f 2 + 20 tX + 2X + 3 = 0 has complex solutions? 

22. If the random variable X ~ Exp(9), then what is the probability density 
function of the random variable Y = Xy/Xl 

23. If the random variable X ~ N(0, 1), then what is the probability density 
function of the random variable Y = \J\X~\1 
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24. If the random variable X ~ /^(/i, <r 2 ), then what is the probability 
density function of the random variable ln(X)? 

25. If the random variable X ~ /\{y, o' 2 ), then what is the mode of XI 

26. If the random variable X ~ f^(p,cr 2 ), then what is the median of X ? 

27. If the random variable X ~ /^(/q er 2 ), then what is the probability that 

the quadratic equation 4 t 2 + 4 tX + X + 2 = 0 has real solutions? 

28. Consider the Karl Pearson’s differential equation p(x) + q(x) y = 0 
where p(x) = a + bx + cx 2 and q(x) = x — d. Show that if a = c = 0, 
b > 0, d > —b, then y(x) is gamma; and if a = 0, b = —c, < 1, | > —1, 
then y(x) is beta. 

29. Let a,6,a,/ 3 be any four real numbers with a < b and a,/3 positive. 
If X ~ BETA(a, f3), then what is the probability density function of the 
random variable Y = (b— a)X + a! 

30. A nonnegative continuous random variable X is said to be memoryless if 
P(X > s + t/X > t) = P(X > s ) for all s,t > 0. Show that the exponential 
random variable is memoryless. 

31. Show that every nonnegative continuous memoryless random variable is 
an exponential random variable. 

32. Using gamma function evaluate the following integrals: 

(i) / 0 °° e~ x2 dx ; (ii) / 0 °° x e~ x 2 dx\ (iii) x 2 e~ x2 dx ; (iv) J 0 °° x 3 e~ x2 dx. 

33. Using beta function evaluate the following integrals: 

(i) fg x 2 (1 — x) 2 dx; (ii) fg°° x 5 (100 — x) 7 dx\ (iii) x 11 (1 — x 3 ) 7 dx. 

34. If r(a:) denotes the gamma function, then prove that 

P(1 + 1 ) P(1 — t) = teosec(f). 


35. Let a and /3 be given positive real numbers, with a < /3. If two points 
are selected at random from a straight line segment of length /3, what is the 
probability that the distance between them is at least a ? 

36. If the random variable X ~ GAM(9, a), then what is the ?r th moment 
of X about the origin? 
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Chapter 7 

TWO RANDOM VARIABLES 


There are many random experiments that involve more than one random 
variable. For example, an educator may study the joint behavior of grades 
and time devoted to study; a physician may study the joint behavior of blood 
pressure and weight. Similarly an economist may study the joint behavior of 
business volume and profit. In fact, most real problems we come across will 
have more than one underlying random variable of interest. 

7.1. Bivariate Discrete Random Variables 

In this section, we develop all the necessary terminologies for studying 
bivariate discrete random variables. 

Definition 7.1. A discrete bivariate random variable (A, V) is an ordered 
pair of discrete random variables. 

Definition 7.2. Let (A, Y) be a bivariate random variable and let Rx and 
Ry be the range spaces of A and Y, respectively. A real-valued function 
/ : Rx x Ry —> It is called a joint probability density function for A and Y 
if and only if 

f(x,y) = P(X=x,Y = y) 

for all ( x,y ) € Rx x Ry- Here, the event (A = x,Y = y) means the 
intersection of the events (A = x) and (Y = y), that is 

(A = x) P|(F = y). 


Example 7.1. Roll a pair of unbiased dice. If A denotes the smaller and 
Y denotes the larger outcome on the dice, then what is the joint probability 
density function of A and Y ? 
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Answer: The sample space S of rolling two dice consists of 


{(1,1) 

(1,2) 

( 1 , 3 ) 

( 1 , 4 ) 

( 1 , 5 ) 

(1,6) 

(2,1) 

(2,2) 

(2,3) 

(2,4) 

(2,5) 

(2,6) 

( 3 , 1 ) 

(3,2) 

(3,3) 

(3,4) 

(3,5) 

(3,6) 

( 4 , 1 ) 

(4,2) 

(4,3) 

(4,4) 

(4,5) 

(4,6) 

( 5 , 1 ) 

(5,2) 

(5,3) 

(5,4) 

(5,5) 

(5,6) 

(6,1) 

(6,2) 

(6,3) 

(6,4) 

(6,5) 

(6,6)} 


The probability density function f(x,y) can be computed for X = 2 and 
Y = 3 as follows: There are two outcomes namely (2,3) and (3,2) in the 
sample S of 36 outcomes which contribute to the joint event (X = 2, Y = 3). 
Hence 

/(2> 3) = P(X = 2, Y = 3) = 

36 

Similarly, we can compute the rest of the probabilities. The following table 
shows these probabilities: 



2 

2 

2 

2 

2 

1 

U 

36 

36 

36 

36 

36 

36 

5 

2 

2 

2 

2 

1 

o 


36 

36 

36 

36 

36 


4 

2 

2 

2 

1 

0 

0 


36 

36 

36 

36 

3 

2 

36 

2 

36 

1 

36 

0 

0 

0 

2 

2 

36 

1 

36 

0 

0 

0 

0 

1 

1 

36 

0 

0 

0 

0 

0 


l 

2 

3 

4 

5 

6 


These tabulated values can be written as 

ifl<a: = j/<6 

ifl<a;<2/<6 
otherwise. 


36 


f( x ’V) = < b 


Example 7.2. A group of 9 executives of a certain firm include 4 who 
are married, 3 who never married, and 2 who are divorced. Three of the 
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executives are to be selected for promotion. Let X denote the number of 
married executives and Y the number of never married executives among 
the 3 selected for promotion. Assuming that the three are randomly selected 
from the nine available, what is the joint probability density function of the 
random variables X and Y1 

Answer: The number of ways we can choose 3 out of 9 is (!]) which is 84. 
Thus 

/(0,0) = P(X = o, y = o) = ^-=o 


/(i>o) = p(x = i, y = o) 


/(2,0) = P{X = 2, Y = 0) 


/(3,0) = P(X = 3, y = 0) 



4 

84 

12 

84 

4 

84' 


Similarly, we can find the rest of the probabilities. The following table gives 
the complete information about these probabilities. 


3 

1 

84 

0 

0 

0 

2 

6 

84 

12 

84 

0 

0 

1 

3 

84 

24 

84 

18 

84 

0 

0 

0 

4 

84 

12 

84 

4 

84 


0 

l 

2 

3 


Definition 7.3. Let (A, Y) be a discrete bivariate random variable. Let 
Rx and Ry be the range spaces of X and Y, respectively. Let f(x, y) be the 
joint probability density function of X and Y. The function 


h{x) = /(^ y) 

y&Ry 
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is called the marginal probability density function of X. Similarly, the func¬ 
tion 

h{y) = y ^> 

x£Rx 


is called the marginal probability density function of Y. 

The following diagram illustrates the concept of marginal graphically. 



Example 7.3. If the joint probability density function of the discrete random 
variables X and Y is given by 


f{x,y) 


iil<x = y<6 
^ iil<x<y<6 

0 otherwise, 


then what are marginals of X and Y ? 


Answer: The marginal of X can be obtained by summing the joint proba¬ 
bility density function /( x, y) for all y values in the range space Ry of the 
random variable Y. That is 


h{x) = /(*> y) 

yeRv 

y =i 

y) + ^2f(x, y ) 

y<x 

o 

x = 1,2,..., 6. 


= / o, x) + ^2 f(x, 

y>x 

= h + (6 - x) 4 + 

= 4 [!3-2.t], 
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Similarly, one can obtain the marginal probability density of Y by summing 
over for all x values in the range space R x of the random variable X. Hence 

ao) = y /o> y ) 

x£Rx 

= y/0> y ) 

x—l 


= f(y, y) + y /o> y) + y /o> y ) 

x<y x>y 


1 

36 

1 

36 


+ (!/- 1 )|+0 

[2y - l], y = 


1 , 2 ,..., 6 . 


Example 7.4. Let X and Y be discrete random variables with joint proba¬ 
bility density function 

f 2 i 0 + 2/) if x = 1,2; y = 1,2,3 
f{x,y) = l 

0 otherwise. 

What are the marginal probability density functions of X and Y? 

Answer: The marginal of X is given by 

3 1 


AO) = y ™ 0 + y ) 


21 
y =i 

= ^i 3:r+ ^ [1 + 2+3] 


x -f- 2 
_ 7 ’ 

Similarly, the marginal of y is given by 

2 

21 

tc=1 

2y J5_ 
21 21 
3 + 2 y 
21 ’ 


x = 1, 2. 


AO) = y w 0 + v) 


y= 1,2,3. 


From the above examples, note that the marginal f\{x) is obtained by sum¬ 
ming across the columns. Similarly, the marginal AO) is obtained by sum¬ 
ming across the rows. 
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The following theorem follows from the definition of the joint probability 
density function. 

Theorem 7.1. A real valued function / of two variables is a joint probability 
density function of a pair of discrete random variables X and Y (with range 
spaces Rx and Ry, respectively) if and only if 


(a) f{x,y)> 0 for all (x, y) € R x x Ry, 


0) E = 

xtzRx y€Ry 


Example 7.5. For what value of the constant k the function given by 

(kxy if x = 1,2,3; y = 1,2,3 

f{x,V) = l 

l0 otherwise 

is a joint probability density function of some random variables X and Y ? 
Answer: Since 


3 3 

i = EE f(x,y) 

x—1y —1 
3 3 

= EE kxy 

x—1y —1 


= £:[l + 2 + 3 + 2 + 4 + 6 + 3 + 6 + 9] 
= 36 k. 


Hence 



and the corresponding density function is given by 


( 3 6 X '!J if x = 1,2,3; y = 1,2,3 

f(x,y) = < 

{ 0 otherwise . 


As in the case of one random variable, there are many situations where 
one wants to know the probability that the values of two random variables 
are less than or equal to some real numbers x and y. 
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Definition 7.4. Let X and Y be any two discrete random variables. The 
real valued function F : B 2 —> B is called the joint cumulative probability 
distribution function of X and Y if and only if 


F(x, y) = P(X < x, Y < y ) 

for all (x, y) € B 2 . Here, the event (X < x, Y < y) means (X < x) fj(F < y). 
From this definition it can be shown that for any real numbers a and b 

F(a < X < b, c < Y < d) = F(b, d) + F{a, c ) - F(a, d) - F(b, c). 

Further, one can also show that 

F (x, y) = V 

s<x t<.y 


where (s, t) is any pair of nonnegative numbers. 


7.2. Bivariate Continuous Random Variables 


In this section, we shall extend the idea of probability density functions 
of one random variable to that of two random variables. 

Definition 7.5. The joint probability density function of the random vari¬ 
ables X and Y is an integrable function /( x, y) such that 

(a) f(x,y) > 0 for all (x,y) SB 2 ; and 

( b ) fZo SZo /(*. V)dxdy=l. 

Example 7.6. Let the joint density function of X and Y be given by 


f{x,y) 


k xy 2 ifO<a;<y<l 

0 otherwise. 


What is the value of the constant k ? 
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Answer: Since / is a joint probability density function, we have 










Hence k = 10. 

If we know the joint probability density function / of the random vari¬ 
ables X and Y, then we can compute the probability of the event A from 


p ( A ) = f f A /(*, V) dxdy. 


Example 7.7. Let the joint density of the continuous random variables X 
and 7 be 

f t (x 2 + 2 xy) if0<a;<l; 0<y<l 

/(*, v) = 

^ 0 elsewhere. 


What is the probability of the event (X < Y) ? 
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Answer: Let A = (X < Y). we want to find 


P(A) = 


f(x, y)dxdy 




(a: 2 + 2 xy) dx 


dy 


6 

5 

6 
5 

2 

5 

2 

5 


I" 

'x 3 2 

L 

+ i s \ 


1 x=y 


dy 


-I x=0 


rl 4 3 , 

3 V dy 


[y*] 


1 

0 


Definition 7.6. Let (X, Y) be a continuous bivariate random variable. Let 
f(x,y) be the joint probability density function of X and Y. The function 



f(x, y) dy 


is called the marginal probability density function of X. Similarly, the func¬ 
tion 



f(x, y) dx 


is called the marginal probability density function of Y. 

Example 7.8. If the joint density function for X and Y is given by 

( | for 0 < y 2 < x < 1 

f(x,y) = < 


[ 0 otherwise, 

then what is the marginal density function of X , for 0 < x < 1? 

Answer: The domain of the / consists of the region bounded by the curve 
x = y 2 and the vertical line x = 1. (See the figure on the next page.) 
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Example 7.9. Let X and Y have joint density function 


f(x,y) 


2 e x v for 0 < x < y < oo 
0 otherwise. 


What is the marginal density of X where nonzero? 
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Answer: The marginal density of X is given by 


/ OO 

f(x, y) dy 

-OO 

/>oo 

= / 2 e~ x ~ y dy 

J X 

/*oo 

= 2 e~ x / e~ v dy 

J X 

= 2e~ x [-e~' y ]™ 
= 2e~ x e~ x 


= 2e“ 


0 < x < oo. 




Example 7.10. Let (A, F) be distributed uniformly on the circular disk 
centered at (0,0) with radius jb. What is the marginal density function of 
X where nonzero? 

Answer: The equation of a circle with radius Jb and center at the origin is 

2,2 ^ 
x +y = -■ 

7T 

Hence, solving this equation for y, we get 


y = ± 



— X 


2 


Thus, the marginal density of X is given by 
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_ 1 
~ 2 

Definition 7.7. Let X and Y be the continuous random variables with 
joint probability density function f(x,y). The joint cumulative distribution 
function F(x, y ) of X and Y is defined as 

/ v rx 

/ f{u,v)dudv 

-oo J —oo 

for all (x, y) el 2 . 

From the fundamental theorem of calculus, we again obtain 



f{x, y ) = 


d 2 F 
dx dy 


Example 7.11. If the joint cumulative distribution function of X and Y is 
given by 


F(x, y) 


| (2 x 3 y + 3 x 2 y 2 ) for 0 < x, y < 1 
0 elsewhere, 


then what is the joint density of X and Y ? 
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Answer: 

/(i '»Vs!, ( 2i3,/+3iV) 

+«**») 

= - (6 a: 2 + 12 xy) 

6/2 „ x 

= g (x + 2xy). 

Hence, the joint density of X and Y is given by 



Example 7.12. Let X and Y have the joint density function 

( 2x for 0 < x < 1; 0 < y < 1 

f{x, y) = < 

10 elsewhere. 


What is P (X + Y < 1 /X < \) ? 
Answer: (See the diagram below.) 
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r < i\ pM+ryiniisM 

” -2 ) P(X< 1) 


f 0 2 J Q 2 2 xdx dy + fl f^ y 2 x dx dy 

fo [/o 5 2xdx ] d V 

l 

_ 6_ 

1 

4 

_ 2 
~ 3' 

Example 7.13. Let X and Y have the joint density function 

( x + y for 0 < x < 1; 0 < y < 1 

f{x, y) = < 

10 elsewhere. 

What is P (2X <1/X + Y<1)? 


Answer: We know that 



P(2X < l/X + Y < 1) = 


P[(X<±)f)(X + Y<l)} 
P(X + Y < 1) 


P [X + Y < 1] = [ [ (x + y)dy dx 

Jo l Jo 


x 2 x 3 (1 —a:) 3 ! 1 

y _ y 6 J 0 
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Similarly 


x<-)[](x + y<i) 


(a; + y) dy dx 


Thus, 


x 2 x 3 (1 — a;) 3 ! 2 

yy 6 J 0 

ii 

48' 


P( 2X < 1/X + Y < 1) 




11 

16' 


7.3. Conditional Distributions 

First, we motivate the definition of conditional distribution using dis¬ 
crete random variables and then based on this motivation we give a general 
definition of the conditional distribution. Let X and Y be two discrete ran¬ 
dom variables with joint probability density f(x,y). Then by definition of 
the joint probability density, we have 

f(x, y) = P(X = x,Y = y). 


If A = {X = x}, B = {Y = y} and f^y ) = P(Y = y), then from the above 
equation we have 


P({X = x}/{Y = y}) = P(A/B) 


P(Af)B) 

P{B) 

P ({.X = x} and {Y = y}) 
P(Y = y) 

f(x, y) 

h{y) 


If we write the P ({X = x} / {Y = y}) as g(x / y), then we have 


g{x/y) 


f(x, y ) 

h(y) 
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For the discrete bivariate random variables, we can write the conditional 
probability of the event {X = x} given the event {Y = y} as the ratio of the 
probability of the event {X = a;} P|{F = y} to the probability of the event 
{Y = y} which is 


g{x/y) 


fjx, y) 
h(y) 


We use this fact to define the conditional probability density function given 
two random variables X and Y. 


Definition 7.8. Let X and Y be any two random variables with joint density 
f(x,y) and marginals fi(x) and fa(y). The conditional probability density 
function g of X, given (the event) Y = y, is defined as 


g(x/y) 


f (x, y) 

h(y) 


h{y) > 0. 


Similarly, the conditional probability density function hofY, given (the event) 
X = x, is defined as 


h{y / x) 


/O, y) 
h{x) 


fi(x) > 0. 


Example 7.14. Let X and Y be discrete random variables with joint prob¬ 
ability function 


f JT (x + y) for x = 1,2,3; y = 1,2. 

f(x, y) = 

0 elsewhere. 

What is the conditional probability density function of X , given Y = 21 
Answer: We want to find g(x/ 2). Since 


g{x/ 2) 


fix, 2 ) 

/a(2) 


we should first compute the marginal of Y, that is / 2 (2). The marginal of Y 
is given by 

h (y) = 2l + 

X=1 

= ^(6 + 3y). 
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Hence /2(2) = Thus, the conditional probability density function of X, 
given Y = 2, is 


g(x/2) = 


f jx, 2 ) 

hi 2) 

51 ( x + 2 ) 
12 
21 


1 

12 


(x + 2), 


x = 1,2,3. 


Example 7.15. Let X and Y be discrete random variables with joint prob¬ 
ability density function 

[ ^ for x = 1,2; y= 1,2,3,4 
f(x,y) = l 

^ 0 otherwise. 

What is the conditional probability of Y given X = x ? 


Answer: 


Therefore 


AO) = ^2fi x ,y) 


y =i 


y =i 

= ^ (41 + 10). 


%0) 


.fix, y ) 

AO) 
hi x + v) 
h ( 4x + 1 °) 

s + 2/ 

4x + 10 


Thus, the conditional probability Y given X = x is 


h{y/x) 


x+y 

4z+10 

0 


for x = 1,2; y = 1, 2, 3,4 
otherwise. 


Example 7.16. Let X and Y be continuous random variables with joint pdf 

( 12 x for 0 < y < 2x < 1 

fix, V ) = l 

l 0 otherwise . 
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What is the conditional density function of Y given X = x ? 


Answer: First, we have to find the marginal of X. 



Thus, the conditional density of Y given X = x is 


h{y/x) = 


f(x,y) 

h(x) 


12 x 

24 x 2 
1 

2x' 


for 


0 < y < 2x < 1 


and zero elsewhere. 


Conditional Density of Y Given X = x 



Example 7.17. Let X and Y be random variables such that X has density 
function 

24 a; 2 for 0 < x < \ 


AO) = 


0 


elsewhere 
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and the conditional density of Y given X = x is 

for 0 < y < 2x 


h(y/x) = 


v 

2 x 2 


0 


elsewhere . 


What is the conditional density of X given Y = y over the appropriate 
domain? 

Answer: The joint density f(x,y) of X and Y is given by 

f{x,y) = h(y/x)f i(x) 

= ^ 24 a; 2 
2 a; 2 


= 12 y for 

The marginal density of Y is given by 


0 < y < 2x < 1. 


/ OO 

/ 0, y) 

-oo 

=/; 


dx 


\2ydx 

= 6y(l — y), for 0 < y < 1. 
Hence, the conditional density of X given Y = y is 

/ 0 , y) 


g(x/y) = 


f2{y) 

12 y 

6 y (1 - y) 

2 


i -y 


Thus, the conditional density of X given Y = y is given by 

for 0 < y < 2x < 1 
0 otherwise. 


9{x/y) = 


Note that for a specific x, the function /( x, y) is the intersection (profile) 
of the surface z = f(x, y) by the plane x = constant. The conditional density 
f(y/x), is the profile of f(x,y) normalized by the factor 
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7.4. Independence of Random Variables 

In this section, we define the concept of stochastic independence of two 
random variables X and Y. The conditional probability density function g 
of X given Y = y usually depends on y. If g is independent of y, then the 
random variables X and Y are said to be independent. This motivates the 
following definition. 

Definition 7.8. Let X and Y be any two random variables with joint density 
f(x,y) and marginals fi(x) and f 2 {y). The random variables X and Y are 
(stochastically) independent if and only if 

f(x, y) = h(x)f 2 (y) 


for all (x, y) € Rx x Ry ■ 

Example 7.18. Let X and Y be discrete random variables with joint density 

Y for 1 < a; = y < 6 


f(x,y) = 


^ for 1 < x < y < 6. 


Are X and Y stochastically independent? 
Answer: The marginals of X and Y are given by 

6 


h(x) = J2f( x ’ y) 


y =i 

= f(x, X) + ^2 f(x, y) + f( x > y ) 

y~>x y<x 

= 35 + (6 - i 4 + 0 


13 — 2x 
36 ’ 


for x = 1,2,..., 6 


and 


6 

f2{y) = Yf( x > y) 

X= 1 

= f(y , v) + Y y) + X y ) 

x<y x>y 

= M + (y_1) ^ + 0 

= 2y-l 
36 


for y = 1,2,..., 6. 
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/(1’1) = ^Sm = / 1 ( i ) / 2( i ) ’ 

we conclude that f(x,y) ^ fi(x) / 2 (y), and X and Y are not independent. 

This example also illustrates that the marginals of X and Y can be 
determined if one knows the joint density f(x, y). However, if one knows the 
marginals of X and Y, then it is not possible to find the joint density of X 
and Y unless the random variables are independent. 

Example 7.19. Let X and Y have the joint density 

f e~( x+y ) for 0 < x, y < oo 
f 0 , y) = < 

^ 0 otherwise. 

Are X and Y stochastically independent? 

Answer: The marginals of X and Y are given by 


and 


poo poo 

h{x) = / f(x, y)dy= / e ~ {x+y) dy = e~ x 

Jo Jo 

poo poo 

h{y)= / f(x, y) dx = / e~ {x+v) dx = e~ v . 

Jo Jo 


Hence 

f(x,y) = e~^ = e~ x e~ y = /i(x)/ 2 (y). 

Thus, X and Y are stochastically independent. 

Notice that if the joint density f(x, y) of X and Y can be factored into 
two nonnegative functions, one solely depending on x and the other solely 
depending on y, then X and Y are independent. We can use this factorization 
approach to predict when X and Y are not independent. 

Example 7.20. Let X and Y have the joint density 

( x + y for 0 < a: < 1; 0 < y < 1 
f{x, y) = < 

10 otherwise. 

Are X and Y stochastically independent? 

Answer: Notice that 

f(x, y)=x + y 


= X 
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Thus, the joint density cannot be factored into two nonnegative functions 
one depending on x and the other depending on y, and therefore X and Y 
are not independent. 

If X and Y are independent, then the random variables U = 4>{X) and 
V = i/j(Y) are also independent. Here </>,?/> : JL —*• JL are some real valued 
functions. From this comment, one can conclude that if X and Y are inde¬ 
pendent, then the random variables e x and Y 3 + Y 2 + 1 are also independent. 

Definition 7.9. The random variables X and Y are said to be independent 
and identically distributed (HD) if and only if they are independent and have 
the same distribution. 

Example 7.21. Let X and Y be two independent random variables with 
identical probability density function given by 

f e~ x for x > 0 

/O) = { 

0 elsewhere. 

What is the probability density function of W = min{X, Y} ? 

Answer: Let G(w) be the cumulative distribution function of W. Then 


G(w) = P(W < w ) 

= 1 - P(W > w) 

= 1 - P( m in{A, Y} > w ) 

= 1 — P(X > w and Y > w) 

= 1 — P{X > w) P(Y > w) (since X and Y are independent) 


= 1 - 


1 - ( e~ w 
l-e~ 2w . 


e x dx 
\2 


o-v 


dy 


Thus, the probability density function of W is 

9 M = T GW = £( i _ e -^ )=2e - 


Hence 


g(w) = 


2 e 2w for w > 0 


0 


elsewhere. 
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PDF of X and Y and PDF of Min(X,Y) 



7.5. Review Exercises 

1. Let X and Y be discrete random variables with joint probability density 
function 

( 0 + 2/) for x = 1,2,3; y = 1,2 

f{x,v) = \ 

v 0 otherwise. 

What are the marginals of X and Y? 

2. Roll a pair of unbiased dice. Let X be the maximum of the two faces and 
Y be the sum of the two faces. What is the joint density of X and Y ? 

3. For what value of c is the real valued function 

f c(x + 2 y) for x = 1, 2; y = 1,2 
f(x,y) = j 

10 otherwise 

a joint density for some random variables X and Y ? 

4. Let X and Y have the joint density 

f e~( x+y ) for 0 < x, y < oo 

f{x,y) = < 

^ 0 otherwise. 

What is P {X > Y > 2) ? 

5. If the random variable X is uniform on the interval from —1 to 1, and the 
random variable Y is uniform on the interval from 0 to 1, what is the prob¬ 
ability that the the quadratic equation t 2 + 2Xt + Y = 0 has real solutions? 
Assume X and Y are independent. 

6. Let Y have a uniform distribution on the interval (0,1), and let the 
conditional density of X given Y = y be uniform on the interval from 0 to 
^ fy. What is the marginal density of X for 0 < x < 1? 
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7. If the joint cumulative distribution of the random variables X and Y is 

(1 — e~ x )(l — e~ v ) for x > 0, y > 0 
0 otherwise, 


F(x,y ) = 


what is the joint probability density function of the random variables X and 
Y, and the P(1 < X < 3, 1 < Y < 2)? 

8. If the random variables X and Y have the joint density 

( | x ior l < x + y < 2, x > 0, y > 0 
f (x, y) = < 

[ 0 otherwise, 

what is the probability P(Y > A' 2 ) ? 

9. If the random variables X and Y have the joint density 

for 1 < a; + y < 2, a: > 0, y > 0 

f(x, y) = < 

[ 0 otherwise, 


what is the probability P[max(A', Y) > 1] ? 

10. Let X and Y have the joint probability density function 

f A xy 2 for 0 < x < y < 2 

/(*,»)= 6 


^ 0 elsewhere. 

What is the marginal density function of X where it is nonzero? 

11. Let X and Y have the joint probability density function 

( 4x for 0 < x < yfy < 1 
fix, y) = { 


y 0 elsewhere. 

What is the marginal density function of Y, where nonzero? 

12. A point (X, Y) is chosen at random from a uniform distribution on the 
circular disk of radius centered at the point (1,1). For a given value of X = x 
between 0 and 2 and for y in the appropriate domain, what is the conditional 
density function for Y? 
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13. Let X and Y be continuous random variables with joint density function 

f § (2 — x — y) for 0 < x,y < 2; 0 < x + y < 2 
f(x, y) = < 

0 otherwise. 

What is the conditional probability P (. X < 11 Y < 1) ? 

14. Let X and Y be continuous random variables with joint density function 

( I2x for 0 < y < 2x < 1 

f(x, y) = < 

10 otherwise. 

What is the conditional density function of Y given X = x ? 

15. Let X and Y be continuous random variables with joint density function 

( 24 xy for x > 0, y > 0, 0 < x + y < 1 

f(x, y) = l 

10 otherwise. 

What is the conditional probability P(X<g|y=j)? 

16. Let X and Y be two independent random variables with identical prob¬ 
ability density function given by 


f e x for x > 0 

/ O) = \ 

< 0 elsewhere. 

What is the probability density function of W = max{X, Y} ? 

17. Let X and Y be two independent random variables with identical prob¬ 
ability density function given by 


f(x) 


for 0 < x < 0 
0 elsewhere, 


for some 6 > 0. What is the probability density function of W = min{ W F}? 

18. Ron and Glenna agree to meet between 5 P.M. and 6 P.M. Suppose 
that each of them arrive at a time distributed uniformly at random in this 
time interval, independent of the other. Each will wait for the other at most 
10 minutes (and if other does not show up they will leave). What is the 
probability that they actually go out? 
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19. Let X and Y be two independent random variables distributed uniformly 
on the interval [0,1]. What is the probability of the event Y > g given that 
Y > 1 - 2X1 

20. Let X and Y have the joint density 

! 8 xy for 0 < y < x < 1 

0 otherwise. 

What is P (X + Y > 1) ? 

21. Let X and Y be continuous random variables with joint density function 

(2 for 0 < y < a: < 1 
fix, y) = l 

l0 otherwise. 

Are X and Y stochastically independent? 

22. Let X and Y be continuous random variables with joint density function 

( 2x for 0 < x, y < 1 

fix, y) = \ 

l 0 otherwise. 

Are X and Y stochastically independent? 

23. A bus and a passenger arrive at a bus stop at a uniformly distributed 
time over the interval 0 to 1 hour. Assume the arrival times of the bus and 
passenger are independent of one another and that the passenger will wait 
up to 15 minutes for the bus to arrive. What is the probability that the 
passenger will catch the bus? 

24. Let X and Y be continuous random variables with joint density function 

! 4 xy for 0 < x, y < 1 

0 otherwise. 

What is the probability of the event X < \ given that Y > |? 

25. Let X and Y be continuous random variables with joint density function 

\ for 0 < x < y < 2 

0 otherwise. 


fix, y) = 


What is the probability of the event X < ^ given that Y = 1? 
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26. If the joint density of the random variables X and Y is 

(1 ifO<x<y<l 
f(x, y) = <5 if 1 < a; < 2, 0 < y < 1 
10 otherwise, 

what is the probability of the event (X < |, Y < ^)? 

27. If the joint density of the random variables X and Y is 


f(x,y) 


[ e min {x,y} _ e ~(x+y) if 0 < X , y < 00 

0 otherwise, 


then what is the marginal density function of X, where nonzero? 
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Chapter 8 


PRODUCT MOMENTS 

OF 

BIVARIATE 
RANDOM VARIABLES 


In this chapter, we define various product moments of a bivariate random 
variable. The main concept we introduce in this chapter is the notion of 
covariance between two random variables. Using this notion, we study the 
statistical dependence of two random variables. 

8.1. Covariance of Bivariate Random Variables 

First, we define the notion of product moment of two random variables 
and then using this product moment, we give the definition of covariance 
between two random variables. 

Definition 8.1. Let X and Y be any two random variables with joint density 
function f(x,y). The product moment of X and Y, denoted by E(XY), is 
defined as 

{ E E xy f{x, y) if X and Y are discrete 

xeRx vcRy 

/V /E x y f( x ' y) d x dy */ X ant ^ Y are continuous. 

Here, Rx and Ry represent the range spaces of X and Y respectively. 

Definition 8.2. Let X and Y be any two random variables with joint density 
function f(x,y). The covariance between X and Y , denoted by Cov(X , Y) 
(or ctxy), is defined as 


Cov(X , Y) = E((X — y x ) (Y - y Y )) 
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where fix and fiy are mean of X and Y, respectively. 

Notice that the covariance of X and Y is really the product moment of 
X — [ix and Y — Hy ■ Further, the mean of fix is given by 

/ OO nO O /*00 

x fi(x) dx = / / x f(x, y) dx dy , 

-oo J —OO J — OO 

and similarly the mean of Y is given by 

/ oo /*oo /*oo 

Vh{y)dy= / / yf(x,y)dydx. 

-oo J —oo J —oo 

Theorem 8.1. Let X and Y be any two random variables. Then 
Cou(X, Y) = £(IY) - £(A') £(Y). 

Proof: 

Y) = 15((A - fi X ) (Y - M v)) 

= E(XY - fix Y - ;iy X + fi x fiy) 

= E(XY) - fix E(Y) - fi Y E(X) + yx fiy 
= E(XY) — fix y-Y — fly M-Y + yx fly 
= E(XY) - fix fiY 
= E(XY) - E(X)E(Y). 

Corollary 8.1. Cov(X, X) = a\. 

Proof: 

Cov(X, X) = E(XX) - E(X) E(X) 

= E{X 2 ) - y\ 

= Var(X) 

= 4 - 

Example 8.1. Let X and Y be discrete random variables with joint density 

for x = 1,2; y = 1,2 

f(x,y )= < 

^ 0 elsewhere. 

What is the covariance <7xy between X and Y. 
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Answer: The marginal of X is 


y =i 


Hence the expected value of X is 


E(X) = J2xfi(x) 

X—l 

= l/ 1 (l) + 2/ 1 (2) 

8 „ 10 

=-h 2 — 

18 18 


Similarly, the marginal of Y is 


a(»> = E = s'< 3+4 »)' 


Hence the expected value of F is 


E{Y) = Y,yh{y) 

y =i 

= l/ 2 (l) + 2/ 2 (2) 

7 „ 11 

=-h 2 — 

18 18 


Further, the product moment of X and Y is given by 


E(XY) = EE*vf(*>v) 


x=ly —1 

= /(1,1) + 2 /(1,2) + 2 /(2,1) + 4 /(2,2) 

3 „ 5 4 ,6 

=-1-2-1-2-1-4 — 

18 18 18 18 

_ 3 + 10 + 8 + 24 
“ 18 
_ 45 
~ 18' 
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Hence, the covariance between X and Y is given by 


Cov{X, Y) 


E{XY) - E(X) E{Y) 


45 

18 


(45) (18) - (28) (29) 
(18)(18) 
810-812 


324 
2 


324 


= -0.00617. 


Remark 8.1. For an arbitrary random variable, the product moment and 
covariance may or may not exist. Further, note that unlike variance, the 
covariance between two random variables may be negative. 

Example 8.2. Let X and Y have the joint density function 

! x + y if 0 < x, y < 1 

0 elsewhere . 

What is the covariance between X and Y ? 


Answer: The marginal density of X is 


/iO)= / (x + y)dy 
Jo 

„2i v =1 


xy 


y 


Jy=0 


1 

= X+ 2‘ 


Thus, the expected value of X is given by 


E(X)= [ x fi (x) dx 

Jo 

f 1 1 

= / x (x + -) dx 

Jo 2 


x 3 x 2 n 1 


J o 


7 

12 
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Similarly (or using the fact that the density is symmetric in x and y), we get 

7 


E(Y) = 


12 ' 


Now, we compute the product moment of X and Y. 

E(XY) = ( f xy(x + y)dxdy 
Jo Jo 

A A 

(. x 2 y + xy ) dxdy 


1 0 L 
rl 


0 Jo 

rl r x 3 y x 2 y 2 

~T~ + 2 

l 4 )* 


V P l 1 
6 6 J 0 

1 1 

6 + 6 
4 

12 ' 


“i X=1 


dy 


J x—0 


dy 


Hence the covariance between X and Y is given by 


Cov(X, Y) = E(XY) - E{X) E(Y) 



48-49 


144 

1 

144' 


Example 8.3. Let X and Y be continuous random variables with joint 
density function 

( 2 if 0 < ?/ < 1 — a;;0<x<l 

f(x,V ) = < 

l 0 elsewhere . 

What is the covariance between X and Y ? 

Answer: The marginal density of X is given by 

/*! — X 

fi(x) = 2 dy = 2(1 — x). 

Jo 
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Hence the expected value of X is 


Hx = E(X) = f x fi (x) dx = f 2(1 —x)dx = 
Jo Jo 

Similarly, the marginal of Y is 

f2(y)=[ 2dx = 2(l-y). 

Jo 

xpected value of Y is 

y Y = E(Y) = [ y f 2 (y ) dy = f 2 (1 - y) dy = 
Jo Jo 

t moment of X and Y is given by 

E(XY) = f [ x y f(x, y) dy dx 
Jo Jo 


Hence the expected value of Y is 


The product moment of X and Y is given by 


io Jo 

rl /■! —® 


= / / xy2dydx 

Jo Jo 

= 2 f'x 


= 2- / x (1 — x) 2 dx 
2 Jo 

= [ (x — 2x 2 + x 3 ) dx 
Jo 


I" 1 2 2 3 1 4 

= 2*-3* + A X 


Therefore, the covariance between X and Y is given by 
Cov(X, Y) = E(XY) - E(X) E(Y) 
1 1 
- 12 ” 9 

3 4 1 

_ 36 “ 36 “ _ 36 


1 

3' 


1 

3' 
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Theorem 8.2. If X and Y are any two random variables and a, b, c, and d 
are real constants, then 

Cov(aX + b , cY + d) = acCov(X,Y). 


Proof: 

Cov{aX + b, cY + d) 

= E ((aX + b)(cY + d))~ E(aX + b) E(cY + d) 

= E (acXY + adX + bcY + bd) - ( aE{X) + b) ( cE(Y) + d) 

= acE(XY) + adE(X) + bcE(Y) + bd 

- [acE(X) E(Y) + adE(X) + bcE(Y) + bd] 

= ac [E(XY) - E(X) E(Y)} 

= acCov(X, Y). 

Example 8.4. If the product moment of X and Y is 3 and the mean of 
X and Y are both equal to 2, then what is the covariance of the random 
variables 2X + 10 and — | Y + 3 ? 

Answer: Since E(XY) = 3 and E(X) = 2 = E(Y), the covariance of X 
and Y is given by 

Cov(X, Y) = E(XY) - E(X) E(Y) = 3 - 4 = -1. 

Then the covariance of 2X + 10 and — | Y + 3 is given by 

Cov (2X + 10,-^T + 3^=2 ^-0 Cov{X, Y) 

= (- 5 ) (- 1 ) 

= 5. 

Remark 8.2. Notice that the Theorem 8.2 can be furthered improved. That 
is, if X, Y, Z are three random variables, then 

Cov(X + Y,Z) = Cov(X , Z) + Cov(Y , Z) 


and 


Cov(X , Y + Z) = Cov(X, Y) + Cov(X, Z ). 
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The first formula can be established as follows. Consider 
Cov{X + Y,Z) = E((X + Y)Z ) - E(X + Y) E(Z) 

= E(XZ + YZ) - E{X)E{Z) - E{Y)E{Z) 

= E(XZ) - E(X)E(Z) + E(YZ) - E(Y)E(Z) 

= Cov(X 1 Z) + Cov(Y, Z). 

8.2. Independence of Random Variables 

In this section, we study the effect of independence on the product mo¬ 
ment (and hence on the covariance). We begin with a simple theorem. 

Theorem 8.3. If X and Y are independent random variables, then 

E(XY) = E(X)E(Y). 


Proof: Recall that X and Y are independent if and only if 

f(x,y) = fi(x)f 2 (y). 


Let us assume that X and Y are continuous. Therefore 


/ OO />00 

/ xy f(x,y)dxdy 

-OO J —OO 

xyfi(x) f 2 (y) dxdy 

>oo 

yh{y) dy 


' —oo */ —OO 

poo />oo 


' —oo J —oo 


x fi ( x ) dx 

-OO 

= E(X)E(Y). 


If X and Y are discrete, then replace the integrals by appropriate sums to 
prove the same result. 


Example 8.5. Let X and Y be two independent random variables with 
respective density functions: 


and 


f(x) 


3x 2 if 0 < x < 1 

0 otherwise 


f 4 y 3 


if 0 < y < 1 


y(y) = 


o 


otherwise . 
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What is E (f) ? 

Answer: Since X and Y are independent, the joint density of X and Y is 
given by 

h(x,y) = f{x)g{y). 


Therefore 




f°° r x 

' / - h(x,y)dxdy 

—oo j —oo y 

[ X f(x)g(y)dxdy 

o Jo y 

1 r 1 ... 

.2 a „.3 . 


X 


- 3x dx dy 
Jo Jo V 

J 3 x 3 dx^j (/ 4 y 2 dy 


= 1 . 


Remark 8.3. The independence of X and Y does not imply E (y) = : §^y 
but only implies E (y) = E(X) E (F _1 ). Further, note that E (T _1 ) is not 
equal to ■ 

Theorem 8.4. If X and Y are independent random variables, then the 
covariance between X and Y is always zero, that is 


Cov(X,Y) = 0. 


Proof: Suppose X and Y are independent, then by Theorem 8.3, we have 
E{XY) = E(X) E(Y). Consider 

Cov(X , Y) = E(XY) - E(X) E(Y) 

= E(X) E(Y) - E{X) E(Y) 

= 0 . 

Example 8.6. Let the random variables X and Y have the joint density 

if (x,y) e{ (0,1), (0,-1), (1,0), (-1,0)} 

f(x,y )= < 

0 otherwise. 

What is the covariance of X and Y ? Are the random variables X and Y 
independent? 
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Answer: The joint density of X and Y are shown in the following table with 
the marginals fi(x) and / 2 (y). 


Domain of tl 

u 

0.5 

ie Joint PDF 

-1 -0.5 

-0.5 

-n 

0.5 1 


0-2/) 

-l 

0 

l 

h(y) 

-l 

0 

1 

4 

0 

1 

4 

0 

1 

4 

0 

1 

4 

2 

4 

1 

0 

1 

4 

0 

1 

4 

AO) 

1 

4 

2 

4 

1 

4 



From this table, we see that 


o = /(o,o)//i(o)A(o) = (?) (j) = i 

and thus 

f(x,y) ± AO) AO) 

for all (x, y) is the range space of the joint variable (A, Y). Therefore X and 
Y are not independent. 

Next, we compute the covariance between X and Y. For this we need 
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E(X), E(Y) and E{XY). The expected value of X is 

1 

E(X) = E */,(*) 

x=— 1 


= (-i)/ 1 (-i) + (0)/ 1 (0) + (1)A(1) 


i 

4 


1 

4 


= 0. 

Similarly, the expected value of Y is 

l 

e(y) = E yMy) 

y=- 1 

= (-i)A(-i) + (0)A(0) + (i)A(i) 

— Uo + i 

4 4 

= 0. 

The product moment of X and Y is given by 

l l 

e(xy)= e E x vfM 

x=—l y=— 1 

= ( 1 ) /(- 1 , - 1 ) + ( 0 ) /(— 1 ) 0 ) + (- 1 ) /(- 1 , 1 ) 

+ (0)/(0,-l) + (0)/(0,0) + (0)/(0,l) 

+ (-l)/(l,-l) + (0)/(l,0) + (l)/(l,l) 

= 0. 

Hence, the covariance between X and Y is given by 

Cov(X , Y) = E(XY) - E(X) E(Y) = 0. 


Remark 8.4. This example shows that if the covariance of X and Y is zero 
that does not mean the random variables are independent. However, we know 
from Theorem 8.4 that if X and Y are independent, then the Cov(X,Y) is 
always zero. 
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8.3. Variance of the Linear Combination of Random Variables 

Given two random variables, X and Y , we determine the variance of 
their linear combination, that is aX + bY. 

Theorem 8.5. Let X and Y be any two random variables and let a and b 
be any two real numbers. Then 

Var(aX + bY) = a 2 Var(X) + b 2 Var(Y) + 2abCov(X, Y). 


Proof: 

Var(aX + bY) 

= E ([aX + bY - E(aX + 6F)] 2 ) 

= E ([aV + bY — a E(X) - bE{Y)f\ 

= E([a{X- fi x ) + b(Y - /zy)] 2 ) 

= E (a 2 (V - fix) 2 + b 2 (Y — fi Y ) 2 + 2ab(X- n x ) (V - My )) 

= a 2 E ((V - fi X ) 2 ) +b 2 E ((V - fi x ) 2 ) +2 abE((X - fi X ) (Y - fiy)) 

= a 2 Var(X) + b 2 Var(Y) +2abCov(X,Y). 

Example 8.7. If Var(X + Y) = 3, Var{X — Y) = 1, E(X) = 1 and 
E(Y) = 2, then what is E(XY) ? 

Answer: 

Var{X + Y) = a 2 x + a^ + 2 Cov(X, Y), 

Var(X - Y) = a 2 x + - 2 Cov(X, Y). 

Hence, we get 

Cov{X , Y)=^[ Var(X + Y) - Var{X - Y)} 

= i I 3 “ n 
_ 1 
““ 2 ' 

Therefore, the product moment of X and Y is given by 
E(XY) = Cov(X , V) + E(X) E(Y) 

= | + ( 1 ) ( 2 ) 

_ 5 

“ 2 
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Example 8.8. Let A and Y be random variables with Var(X) = 4, 
Var(Y ) = 9 and Var(X - Y) = 16. What is Cov{X,Y) ? 

Answer: 

Var(X - Y) = Var(X) + Var(Y) - 2 Cov(X, Y) 

16 = 4 + 9-2 C<w(X,Y). 

Hence 

Cov(X,Y) = ~. 

Remark 8.5. The Theorem 8.5 can be extended to three or more random 
variables. In case of three random variables A, Y, Z , we have 

Var{X + Y + Z ) 

= Var(X) + Var(Y) + Var(Z) 

+ 2Cov(X, Y) + 2 Cov(Y, Z) + 2 Cov(Z, X). 

To see this consider 

Var(X + Y + Z) 

= Var((X + Y) + Z) 

= Var(X + Y) + Var(Z) + 2Cov(X + Y, Z) 

= Var(X + Y) + Var(Z) + 2Cov(X, Z) + 2Cov(Y, Z) 

= Var(X) + Var(Y) + 2 Cov(X, Y) 

+ Var(Z) + 2Cov(X, Z) + 2Cov(Y 1 Z) 

= Var(X) + Var(Y) + Var(Z) 

+ 2Cov(X, Y) + 2Cov(Y, Z) + 2Cov(Z, X). 

Theorem 8.6. If X and Y are independent random variables with E(X) = 
0 = E(Y), then 

Var(XY) = Var(X) Vor(Y). 


Proof: 

Var(XY) = E ((AY) 2 ) - (J5(A) E(Y)) 2 
= E ((AY) 2 ) 

= E (A 2 Y 2 ) 

= E (A 2 ) E (Y 2 ) (by independence of A and Y) 
= Var(X) Var(Y). 
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Example 8.9. 

density 


Let X and Y be independent random variables, each with 
f(x) = 


X for —6 < x < 6 


0 


otherwise. 


If the Var{XY) = then what is the value of 9 ? 


Answer: 



1 

29 



= 0 . 


Since Y has the same density, we conclude that E(Y) = 0. Hence 


y = Var(XY) 


= Var(X) Var(Y) 



e 4 

¥' 


Hence, we obtain 

6 4 = 64 or 9 = 2V2. 


8.4. Correlation and Independence 

The functional dependency of the random variable Y on the random 
variable X can be obtained by examining the correlation coefficient. The 
definition of the correlation coefficient p between X and Y is given below. 

Definition 8.3. Let X and Y be two random variables with variances err¬ 
and <Ty, respectively. Let the covariance of X and Y be Cov(X,Y). Then 
the correlation coefficient p between X and Y is given by 

Cov(X, Y) 

P= -• 

Ox Oy 


Theorem 8.7. If X and Y are independent, the correlation coefficient be¬ 
tween X and Y is zero. 
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Proof: 


_ Cau(X, Y ) 

ax cry 

0 

a x cry 
= 0 . 


Remark 8.4. The converse of this theorem is not true. If the correlation 
coefficient of X and Y is zero, then X and Y are said to be uncorrelated. 

Lemma 8.1. If X* and Y* are the standardizations of the random variables 
X and Y, respectively, the correlation coefficient between X* and Y * is equal 
to the correlation coefficient between X and Y. 

Proof: Let p* be the correlation coefficient between X* and Y*. Further, 
let p denote the correlation coefficient between X and Y. We will show that 
p* = p. Consider 

* _ Cov (X*,Y*) 
ax * cry * 

= Cov (X*,Y*) 

= Cov(^^,^) 

\ cr x a Y J 

= —— Cov (X - px, Y - p Y ) 
a x a y 

_ Cov (V, Y) 
ax cry 

= P- 


This lemma states that the value of the correlation coefficient between 
two random variables does not change by standardization of them. 

Theorem 8.8. For any random variables X and Y, the correlation coefficient 
p satisfies 

—1 < P < 1, 


and p = 1 or p = — 1 implies that the random variable Y = a X + b, where a 
and b are arbitrary real constants with a^O. 

Proof: Let p\ be the mean of X and p Y be the mean of Y, and a\ and ay 
be the variances of X and Y, respectively. Further, let 


X — px 


Y - p Y 


crx 


cry 


X 


and Y 
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be the standardization of X and Y, respectively. Then 

px* = 0 and a 2 x , = 1, 

and 

py = 0 and ay, = 1. 


Thus 

Var(X* - Y*) = Var{X*) + Var(Y*) - 2 Cov(X*,Y*) 

2 | 2 r\ * _ 

= &x* I OY * ~ ^ P a x* &Y* 

= 1 + 1-2 p* 

= 1 + 1 — 2 p (by Lemma 8.1) 

= 2(1 -p). 

Since the variance of a random variable is always positive, we get 

2 (1 - p) > 0 


which is 


P< I- 


By a similar argument, using Var(X* + V*), one can show that —1 < p. 
Hence, we have —1 < p < 1. Now, we show that ifp = 1 or p = —1, then Y 
and X are related through an affine transformation. Consider the case p = 1, 
then 

Var(X* - Y*) = 0. 


But if the variance of a random variable is 0, then all the probability mass is 
concentrated at a point (that is, the distribution of the corresponding random 
variable is degenerate). Thus Var(X* — Y*) = 0 implies X* — Y* takes only 
one value. But E [X* — Y*\ = 0. Thus, we get 


X* - Y* = 0 


or 


_ ■J/'* 


Hence 


X_ ~ Px _ Y — p Y 
ax a y 


Solving this for Y in terms of X, we get 


Y = aX + b 
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where 

a = —and b = \iy -a/i x- 
ox 

Thus if p = 1, then Y is a linear in X. Similarly, we can show for the case 
p = — 1, the random variables X and Y are linearly related. This completes 
the proof of the theorem. 

8.5. Moment Generating Functions 

Similar to the moment generating function for the univariate case, one 
can define the moment generating function for the bivariate case to com¬ 
pute the various product moments. The moment generating function for the 
bivariate case is defined as follows: 

Definition 8.4. Let X and Y be two random variables with joint density 
function f(x,y). A real valued function M :® 2 —>1R defined by 

M(s, t) = E{ e sX+tY ) 

is called the joint moment generating function of X and Y if this expected 
value exists for all s is some interval —h<s<h and for all t is some interval 
—k<t<k for some positive h and k. 

It is easy to see from this definition that 

M(s, 0) = E(e sX ) 


and 

M(0,f) = E(e tY ) . 

From this we see that 


E(X k ) = 


d k M(s,t) 

ds k 


E(Y k ) = 


(0.0) 


d k M(s,t ) 
dt* 


(0,0) 


for k = 1,2, 3,4,...; and 


E(XY) = 


d 2 M(s, t) 


ds dt 


(0,0) 


Example 8.10. Let the random variables X and Y have the joint density 

e~ y for 0 < x < y < oo 

0 otherwise. 


f(x,y) = 
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What is the joint moment generating function for X and Y ? 

Answer: The joint moment generating function of X and Y is given by 



Example 8.11. If the joint moment generating function of the random 
variables X and Y is 

AI (.s t ) = g( s +3i+2s 2 +i8^ 2 +i2si) 

what is the covariance of X and Y ? 

Answer: 
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M(s, t) 
dM 
~da 

dM\ 


ds 


( 0 , 0 ) 


e (s+3t+2s 2 + 18t 2 + 12st) 

(l + 4s+12i)M(s,t) 

1M(0,0) 

1 . 


dM 

~dt 


dM 

Ik 


(0,0) 


(3 + 36t + 12s) M(s, t) 
3M(0,0) 

3. 


Hence 


(ix = 1 and (iy = 3. 
Now we compute the product moment of X and Y. 


Therefore 


Thus 


d 2 M(s,t ) _ d (dM 
dsdt dt \ ds 
d 


— — (A/(s, t ) (1 + 4s + 12f)) 

dM 

= (l + 4s + 12f) — + M(s,t) (12). 


d 2 M(s,t) 


ds dt 


= 1(3) + 1(12). 


(0,0) 


E(XY) = 15 

and the covariance of X and Y is given by 


Cov(X, Y) = E(XY) - E(X) E(Y) 

= 15 — (3) (1) 

= 12 . 


Theorem 8.9. If X and Y are independent then 


M aX +bY{t) = M x {at) M y (bt ) 
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where a and b real parameters. 

Proof: Let W = aX + bY. Hence 
M aX +bY(t) = M w (t) 

= E ( e tw ) 

= E (e‘( aX+6r )') 

= E (e taX e tbY ) 

= E (e taX ) E (e tbY ) (by Theorem 8.3) 

= M x (at) M v (bt). 

This theorem is very powerful. It helps us to find the distribution of a 
linear combination of independent random variables. The following examples 
illustrate how one can use this theorem to determine distribution of a linear 
combination. 

Example 8.12. Suppose the random variable X is normal with mean 2 and 
standard deviation 3 and the random variable Y is also normal with mean 
0 and standard deviation 4. If X and Y are independent, then what is the 
probability distribution of the random variable X + Y ? 

Answer: Since X ~ TV(2,9), the moment generating function of X is given 
by 

M x (t) =e^+5 ff2 * 2 =e 2t+ i t2 . 

Similarly, since Y ~ iV(0,16), 

M Y (t) = = e^* 2 . 

Since X and Y are independent, the moment generating function of X + Y 
is given by 

M x+Y (t) = M x (t) M Y (t ) 

= e 2 t + l t2 e ¥* 2 

= e 2t+ft\ 

Hence X + Y ~ IV(2, 25). Thus, X + Y has a normal distribution with mean 
2 and variance 25. From this information we can find the probability density 
function of W = X + Y as 

f(w) = —J= e - ^ 3 ^) 


— OO < W < 00. 
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Remark 8.6. In fact if X and Y are independent normal random variables 
with means fi x and fiy and variances a\ and ay, respectively, then aX + bY 
is also normal with mean afi x + bfiy and variance ara\ + b 2 Uy. 


Example 8.13. Let X and Y be two independent and identically distributed 
random variables. If their common distribution is chi-square with one degree 
of freedom, then what is the distribution of X + Y ? What is the moment 
generating function of X — Y ? 


Answer: Since X and Y are both x 2 (l), the moment generating functions 
are 


M x (t) = 


1 

VT^2 1 


and 


M Y (t) = 


1 

VT^Yt 


Since, the random variables X and Y are independent, the moment generat¬ 
ing function of X + Y is given by 


Mx+y(t ) = M x (t) M Y (t) 

1 1 
~~ VT^Tt vT--2t 
1 

~~ (1 — 2t)§' 

Hence X + Y ~ X 2 (2). Thus, if X and Y are independent chi-square random 
variables, then their sum is also a chi-square random variable. 

Next, we show that X — Y is not a chi-square random variable, even if 
X and Y are both chi-square. 


Mx-y(t) = M x (i) My(-t) 

1 1 
~ VI - 2t VI+ 2 1 
1 

~ VI ~ 4t 2 ' 

This moment generating function does not correspond to the moment gener¬ 
ating function of a chi-square random variable with any degree of freedoms. 
Further, it is surprising that this moment generating function does not cor¬ 
respond to that of any known distributions. 

Remark 8.7. If X and Y are chi-square and independent random variables, 
then their linear combination is not necessarily a chi-square random variable. 
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Example 8.14. Let X and Y be two independent Bernoulli random variables 
with parameter p. What is the distribution of X + Y ? 

Answer: Since X and Y are Bernoulli with parameter p, their moment 
generating functions are 

M x (t) = (1 - p) + pe* M Y (t) = (1 — p) + pe*. 

Since, X and Y are independent, the moment generating function of their 
sum is the product of their moment generating functions, that is 

M x +y{t) = M x (t) My(t) 

= (l-p + pe t ) (l -p + pe l ) 

= (1 -p + pe 1 ) 2 . 

Hence X + Y ~ BIN (2, p). Thus the sum of two independent Bernoulli 
random variable is a binomial random variable with parameter 2 and p. 

8.6. Review Exercises 

1. Suppose that X- t and X 2 are random variables with zero mean and unit 
variance. If the correlation coefficient of X\ and X 2 is —0.5, then what is the 
variance of Y = J2k=i k 2 Xk ? 

2. If the joint density of the random variables X and Y is 
if (x, y) G { [x, 0), (0, -y) \x,y= -2, -1,1, 2 } 
otherwise, 

what is the covariance of X and Y ? Are X and Y independent? 

3. Suppose the random variables X and Y are independent and identically 
distributed. Let Z = aX + Y. If the correlation coefficient between X and 
Z is |, then what is the value of the constant a ? 

4. Let X and Y be two independent random variables with chi-square distri¬ 
bution with 2 degrees of freedom. What is the moment generating function 
of the random variable 2X + 3 Y ? If possible, what is the distribution of 
2X + 3F ? 

5. Let X and Y be two independent random variables. If X ~ BIN(n, p) 
and Y ~ BIN(m, p), then what is the distribution of X + Y ? 
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6. Let X and Y be two independent random variables. If X and Y are 
both standard normal, then what is the distribution of the random variable 
I (X 2 + Y 2 ) ? 

7. If the joint probability density function of X and Y is 

( 1 if0<x<l;0<y<l 

f(x,y) = 

10 elsewhere, 

then what is the joint moment generating function of X and Y ? 

8. Let the joint density function of X and Y be 

if 1 < x = y < 6 
if 1 < x < y < 6. 

What is the correlation coefficient of X and Y ? 


,f(x,y) = 


jp 

36 


2 _ 

36 


9. Suppose that X and Y are random variables with joint moment generating 
function 


M(s,t ) 



for all real s and t. What is the covariance of X and Y ? 


10. Suppose that X and Y are random variables with joint density function 

{ J_ f or vl < i 

6tt iul 4 X- 9 ^ r 

0 for £ + £ > 1. 

What is the covariance of X and Y ? Are X and Y independent? 

11. Let X and Y be two random variables. Suppose E(X) = 1, E{Y ) = 2, 
Var(X) = 1, Var(Y) = 2, and Cov(X,Y) = For what values of the 
constants a and b, the random variable aX + bY, whose expected value is 3, 
has minimum variance? 

12. A box contains 5 white balls and 3 black balls. Draw 2 balls without 
replacement. If X represents the number of white balls and Y represents the 
number of black balls drawn, what is the covariance of X and Y ? 

13. If X represents the number of l’s and Y represents the number of 5’s in 
three tosses of a fair six-sided die, what is the correlation between X and Y? 
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14 . Let Y and Z be two random variables. If Var(Y) = 4, Var(Z) = 16, 
and Cov(Y, Z) = 2, then what is Var(3Z — 2Y)7 

15 . Three random variables Xi, X 2 ,X 3 , have equal variances a 2 and coef¬ 
ficient of correlation between X 3 and X 2 of p and between X 3 and X 3 and 
between X 2 and X 3 of zero. What is the correlation between Y and Z where 
Y = X-, + X 2 and Z = X 2 + X 3 7 

16 . If X and Y are two independent Bernoulli random variables with pa¬ 
rameter p , then what is the joint moment generating function of X — Y ? 

17 . If X -\, X 2 . .... X n are normal random variables with variance cr 2 and 
covariance between any pair of random variables pa 2 , what is the variance 
of i (Xr + X 2 + • • • + X„) ? 

18 . The coefficient of correlation between X and Y is | and a 2 x = a , 
CTy = 4a, and a 2 z = 114 where Z = 3X — 4 Y. What is the value of the 
constant a ? 

19 . Let X and Y be independent random variables with E(X) = 1, E(Y) = 
2, and Var(X) = Var(Y) = a 2 . For what value of the constant k is the 
expected value of the random variable k(X 2 — Y 2 ) + Y 2 equals cr 2 ? 

20. Let X be a random variable with finite variance. If Y = 15 — X, then 
what is the coefficient of correlation between the random variables X and 
(X + Y)X ? 



Conditional Expectations of Bivariate Random Variables 


238 


Chapter 9 

CONDITIONAL 

EXPECTATION 

OF 

BIVARIATE 
RANDOM VARIABLES 


This chapter examines the conditional mean and conditional variance 
associated with two random variables. The conditional mean is very useful 
in Bayesian estimation of parameters with a square loss function. Further, the 
notion of conditional mean sets the path for regression analysis in statistics. 

9.1. Conditional Expected Values 

Let X and Y be any two random variables with joint density f(x,y). 
Recall that the conditional probability density of X, given the event Y = y, 
is defined as 

g(x/y) = > h{y) > 0 

where f 2 (y) is the marginal probability density of Y. Similarly, the condi¬ 
tional probability density of Y, given the event X = x, is defined as 

h(y/x) = , fi( x ) > 0 

where fi(x) is the marginal probability density of X. 

Definition 9.1. The conditional mean of X given Y = y is defined as 


Vx\ y = E(X\y), 
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where 

xg(x/y) if X is discrete 

xGR x 

xg(x/y)dx if X is continuous. 

Similarly, the conditional mean of Y given X = x is defined as 

d‘Y\x E | x) i 

where 

yy yh(y/x) if Y is discrete 

y&Rv 

f^° y h{y/x) dy if Y is continuous. 

Example 9.1. Let X and Y be discrete random variables with joint proba¬ 
bility density function 


E (Y | x) = 


E (X | y) = 


(^(x + y) for x = 1,2,3; y = 1,2 

f(x,y) = < 

{0 otherwise. 

What is the conditional mean of X given Y = y, that is E(X\y)? 

Answer: To compute the conditional mean of X given Y = y, we need the 
conditional density g(x/y) of X given Y = y. However, to find g(x/y), we 
need to know the marginal of Y, that is fi{y)- Thus, we begin with 


fziv) = ^2^( x + v) 

X=1 

= Jj(6 + %). 

Therefore, the conditional density of X given Y = y is given by 


g(x/y) = 


f(x, y) 
h(y) 


x + y 
6 + 3 y' 


x = 1,2,3. 



Conditional Expectations of Bivariate Random Variables 


240 


The conditional expected value of X given the event Y = y 


E(X \y) 


X x d( x /y) 

X(zlRx 


X 


x 


X=1 

1 

6 + 3 y 
14 + 6 y 
6 + 3 y 1 


x + y 
6 + 3 y 

3 

X'' 


yJ2 x 

X=1 

V = 1,2. 


Conditional Expectation of X given Y =y 


Remark 9.1. Note that the conditional mean of X given Y = y is dependent 
only on y, that is E(X\y) is a function <t> of y. In the above example, this 
function <j> is a rational function, namely <p(y) = ■ 

Example 9.2. Let X and Y have the joint density function 

! x + y for 0 < x, y < 1 

0 otherwise. 

What is the conditional mean E (Y \ X = |) ? 

Answer: 


fi( x ) = (x + y)dy 

Jo 


xy- 


J o 
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3 

5' 


The mean of the random variable Y is a deterministic number. The 
conditional mean of Y given X = x, that is E(Y\x), is a function <j>{x) of 
the variable x. Using this function, we form <t>(X). This function <p(X) is a 
random variable. Thus starting from the deterministic function E(Y\x), we 
have formed the random variable E(Y\X ) = (j>(X). An important property 
of conditional expectation is given by the following theorem. 

Theorem 9.1. The expected value of the random variable E{Y\X) is equal 
to the expected value of Y, that is 

E x (E y]x (Y\X))=E y (Y), 
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where E x (X ) stands for the expectation of X with respect to the distribution 
of X and E y i x (Y\X) stands for the expected value of Y with respect to the 
conditional density h(y/X). 


Proof: We prove this theorem for continuous variables and leave the discrete 
case to the reader. 


E x ( E y \ x (Y\X)) = E x 


' — OO 
pOO / pOO 


> —oo \J — OO 
pOO / p oo 


' — oo \J —oo 
pOO / p oo 


' —OO \J — oo 
poo / /*oc 


y h{y/X) dy 
yh{y/x)dy \ fi(x)dx 
yHy/x)fi(x)dy\ dx 
h(y/x)fi(x)dx\ ydy 
f{x,y)dx] y dy 


—oo W —OO 


-i: 


yf 2 {y) dy 


= Ey(Y). 


Example 9.3. An insect lays Y number of eggs, where Y has a Poisson 
distribution with parameter A. If the probability of each egg surviving is p, 
then on the average how many eggs will survive? 

Answer: Let X denote the number of surviving eggs. Then, given that 
Y = y (that is given that the insect has laid y eggs) the random variable X 
has a binomial distribution with parameters y and p. Thus 

X\Y - BIN(Y,p) 

Y ~ POI(X). 

Therefore, the expected number of survivors is given by 
E x (X)=E y (E x]y (X\Y)) 

= Ey(pY) (since X|Y~ BIN(Y, p)) 

= PEy(Y) 

= p A. (since Y ~ POI(A)) 


Definition 9.2. A random variable X is said to have a mixture distribution 
if the distribution of X depends on a quantity which also has a distribution. 
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Example 9.4. A fair coin is tossed. If a head occurs, 1 die is rolled; if a tail 
occurs, 2 dice are rolled. Let Y be the total on the die or dice. What is the 
expected value of Y‘l 

Answer: Let X denote the outcome of tossing a coin. Then X ~ BER(p), 
where the probability of success is p = 


E y (Y) = E x { E y j x (Y\X )) 

= ±E ylx (Y\X = 0)+ 1 -E ylx (Y\X = l) 

nr 

1 /2 + 6+ 12 + 20 + 30 + 42 + 40 + 36 + 30 + 22+ 12 

36 

_ 1 /126 252 

~ 2 \~36~ + "36“ 

_ 378 
_ ~72 
= 5.25. 


Note that the expected number of dots that show when 1 die is rolled is 
and the expected number of dots that show when 2 dice are rolled is ^. 

Theorem 9.2. Let X and Y be two random variables with mean yx and 
/jy, and standard deviation ax and ay, respectively. If the conditional 
expectation of Y given X = x is linear in x, then 

E(Y\X = X) = HY + p n: ( X - M y ), 
ax 

where p denotes the correlation coefficient of X and Y. 

Proof: We assume that the random variables X and Y are continuous. If 
they are discrete, the proof of the theorem follows exactly the same way by 
replacing the integrals with summations. We are given that E(Y\X = x) is 
linear in x, that is 

E(Y\X = x) = ax + b, (9.0) 

where a and b are two constants. Hence, from above we get 

/>oo 

/ U h{y/x) dy = ax + b 


—oo 
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which implies 


°° f{x,y) , 

y \ f \ d V = ax + b. 


fi(x) 

Multiplying both sides by fi(x), we get 

/ OO 

yf(x,y) dy= (ax + b) fi(x) 

-OO 

Now integrating with respect to x, we get 

/ OO 

(ax + b) f x (x)dx 

-OO 


(9.1) 


/*oo /*oo 


' —00 «/ —OO 


This yields 


Py = a, Px + b. (9-2) 

Now, we multiply (9.1) with x and then integrate the resulting expression 
with respect to x to get 



xy f(x, y) dy dx 


(ax 2 + bx) fi(x) dx. 


From this we get 

E(XY) = aE(X 2 )+bnx- (9.3) 

Solving (9.2) and (9.3) for the unknown a and b, we get 


E(XY)-fi xf x Y 


&XY 



GXY &Y 


Ox cry ax 
a Y 

= P —• 
ax 


Similarly, we get 


b= hy + P 


a Y 

— Px ■ 
ax 


Letting a and b into (9.0) we obtain the asserted result and the proof of the 
theorem is now complete. 


Example 9.5. Suppose X and Y are random variables with E(Y\X = x) = 
—x + 3 and E(X\Y = y) = —^y + 5. What is the correlation coefficient of 


X and Y ? 
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Answer: From the Theorem 9.2, we get 

Hy + p-(x- px) = -x + 3. 
ox 

Therefore, equating the coefficients of x terms, we get 

oy . 

P — = - 1 . 
ox 


Similarly, since 


we have 


px + p — {y ~ Py) = -\y + 5 

Gy 4 


a x 1 


Multiplying (9.4) with (9.5), we get 


oy o x , . , , - 

P— P— = P 1 ) -7 


ox oy 


which is 


Solving this, we get 


P 2 = 


P = ± T 


Since p — = — 1 and — > 0, we get 

fax ffx ’ ° 


P = 


(9.4) 


(9.5) 


9.2. Conditional Variance 

The variance of the probability density function f(y/x) is called the 
conditional variance of Y given that X = x. This conditional variance is 
defined as follows: 

Definition 9.3. Let X and Y be two random variables with joint den¬ 
sity f{x,y) and f(y/x) be the conditional density of Y given X = x. The 
conditional variance of Y given X = x, denoted by Var{Y\x), is defined as 

Var(Yja:) = E (Y 2 \ x) - (E(Y\x)f , 

where E(Y\x) denotes the conditional mean of Y given X = x. 
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Example 9.6. Let X and Y be continuous random variables with joint 
probability density function 


f(x,y) 


o~V 


0 


for 0 < x < y < oo 
otherwise. 


What is the conditional variance of Y given the knowledge that X = xl 
Answer: The marginal density of fi(x) is given by 


/ OO 

-oo 


dy 


o~V 


dy 


= —e v [ 


= e 


— X 


Thus, the conditional density of Y given X = x is 


h{y/x) 


f(x,y ) 

h(x) 

e~y 


e -x 

= e ~(y~x) 


for y > x. 


Thus, given X = x, Y has an exponential distribution with parameter 8 = 1 
and location parameter x. The conditional mean of Y given X = x is 


E{Y\x) 


— OO 
OO 


yh(y/x) dy 
ye~t y - x ) dy 

poo 

/ (z + x) e~ z dz 
Jo 


where z = y — x 


P OO p OO 

= x e~ z dz + 

Jo Jo 

= ®r(i) + r(2) 

= x + 1. 


ze 2 dz 
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Similarly, we compute the second moment of the distribution h(y/x). 


E(Y 2 \x) = / y 2 h(y/x) dy 


’ — oo 
poo 


y 2 e- {v ~ x) dy 


n OO 

/ (z + x) 2 e~ z dz 

Jo 


where z = y — x 


/»oo /*oo /»oo 

= x 2 e~ z dz + / 2 ; 2 e~ z dz + 2 x / ze~ z dz 

Jo Jo Jo 

= x 2 r(l) +r(3) + 2 xT(2) 

— ir - —|— 2 — 2x 
= (1 + x) 2 + 1. 




Therefore 

Var(Y\x) = E (F 2 |a;) - [E(Y\x) ] 2 
= (1 + x) 2 + 1 — (1 + x) 2 
= 1 . 

Remark 9.2. The variance of Y is 2. This can be seen as follows: Since, the 
marginal of Y is given by / 2 (y) = Jq e~ v dx = y e~ v , the expected value of Y 
is E(Y) = /“ y 2 e~y dy = T(3) = 2, and E (Y 2 ) = f™ y 3 e~y dy = T(4) = 6. 
Thus, the variance of Y is Var(Y) = 6—4 = 2. However, given the knowledge 
X = x, the variance of Y is 1. Thus, in a way the prior knowledge reduces 
the variability (or the variance) of a random variable. 

Next, we simply state the following theorem concerning the conditional 
variance without proof. 
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Theorem 9.3. Let X and Y be two random variables with mean p\ and 
HYi and standard deviation a\ and ay, respectively. If the conditional 
expectation of Y given X = x is linear in x, then 

E x (Var(Y\X)) = (l-p 2 )Var(Y), 


where p denotes the correlation coefficient of X and Y. 

Example 9.7. Let E(Y\X = x)=2x and Var(Y\X = x) = 4x 2 , and let X 
have a uniform distribution on the interval from 0 to 1. What is the variance 
of Y? 


Answer: If E[Y\X = x) is linear function of x, then 

E(Y\X = x) = p Y + p — (x - p x ) 
crx 


and 

E x (Var(Y\X)) = a$(l-p 2 ). 


We are given that 

p Y + P — {x-px) =2x. 
crx 

Hence, equating the coefficient of x terms, we get 


a Y 

P — = 
crx 


2 


which is 


Further, we are given that 


P = 


2 


crx 

U Y ' 


(9.6) 


Var(Y\X = x) = 4x 2 


Since X ~ UNIF(0, 1), we get the density of X to be /( x) = 1 on the 
interval (0,1) Therefore, 


E x (Var(Y\X) 



Var(Y\X 



4 

3' 


x) f(x) dx 


o 
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By Theorem 9.3, 


Hence 


\=E x (Var(Y\X )) 

= 4 (i - p 2 ) 


= 4 



= 4 — 4 4 



Since X ~ UNIF( 0,1), the variance of X is given by a 2 x 
the variance of Y is given by 


' . Therefore, 


4 416 4 20 5 

3 + 12 _ 12 + 12 _ 12^3' 


Example 9.8. Let E(X\Y = y) = 3y and Var(X\Y = y) = 2, and let Y 
have density function 



What is the variance of XI 
Answer: By Theorem 9.3, we get 


if y > 0 
otherwise. 


Var(X\Y = y) = a 2 x (1 - p 2 ) = 2 


(9.7) 


and 


Thus 


yx + p — (y- hy) = 3 y. 

a Y 


P = 3 


ov 

ax' 


Hence from (9.7), we get E y (Var(X|Y)) = 2 and thus 


a 2 x 


1-9-fI=2 

4, 


which is 

(= 9 -|- 2. 
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Now, we compute the variance of Y. For this, we need E(Y) and E ( Y 2 

noo 

E(Y)= y f(y) dy 
Jo 


Similarly 


Therefore 


ye v dy 


Jo 

= r (2) 

= l. 


/»oo 

E{Y 2 )= y 2 f (y) dy 
Jo 


y 2 e v dy 


Jo 

= r (3) 
= 2 . 


Var(Y) = E (Y 2 ) - [ E(Y) ] 2 = 2 - 1 = 1. 
Hence, the variance of X can be calculated as 


— 9 dy T 2 
= 9(1) +2 
= 11 . 


Remark 9.3. Notice that, in Example 9.8, we calculated the variance of Y 
directly using the form of f(y). It is easy to note that f(y) has the form of 
an exponential density with parameter 0 = 1, and therefore its variance is 
the square of the parameter. This straightforward gives <jy = 1. 

9.3. Regression Curve and Scedastic Curve 

One of the major goals in most statistical studies is to establish relation¬ 
ships between two or more random variables. For example, a company would 
like to know the relationship between the potential sales of a new product 
in terms of its price. Historically, regression analysis was originated in the 
works of Sir Francis Galton (1822-1911) but most of the theory of regression 
analysis was developed by his student Sir Ronald Fisher (1890-1962). 
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Definition 9.4. Let X and Y be two random variables with joint probability 
density function f(x,y ) and let h(y/x) is the conditional density of Y given 
X = x. Then the conditional mean 


E(Y\X = x) 



yh{y/x ) dy 


is called the regression function of Y on X. The graph of this regression 
function of Y on X is known as the regression curve of Y on X. 

Example 9.9. Let X and Y be two random variables with joint density 


f{x,y) 


xe - x (i+y) 

0 


if x > 0; y > 0 
otherwise. 


What is the regression function of Y on XI 
Answer: The marginal density /i(x) of X is 


h{x) 


/ OO 

f{x,y) 

-oo 


dy 


x e 


-x(l+y) 


dy 


JO 


x e~ x e~ xy dy 

pOO 

xe~ x / e~ xv dy 

Jo 


xe 


— e 


J o 


The conditional density of Y given X = x is 


h{y/x) = 


f(x,y) 

fi(x) 

xe~ x ( l + v ) 
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The conditional mean of Y given that X = x is 


E(Y\X = x ) 


' — OO 

p oo 


yh(y/x) dy 
yxe~ xv dy 


Jo 


ze z dz (where 2 = xy) 

=; r < 2 » 

_ 1 

x' 

Thus, the regression function (or equation) of Y on X is given by 



E(Y\x) = - 
x 


for 0 < x < 00 . 




Definition 9.4. Let X and Y be two random variables with joint probability 
density function f(x,y) and let E(Y\X = x) be the regression function of Y 
on X. If this regression function is linear, then E(Y\X = x) is called a linear 
regression of Y on X. Otherwise, it is called nonlinear regression of Y on X. 

Example 9.10. Given the regression lines E(Y\X = x) = x + 2 and 
E(X\Y = y) = 1 + \ y, what is the expected value of X ? 

Answer: Since the conditional expectation E(Y\X = x) is linear in x, we 
get 

y-Y + P — [X- yx) = X + 2 . 

Ox 

Hence, equating the coefficients of x and constant terms, we get 

cry 
P — 

<?x 


(9.8) 




Probability and Mathematical Statistics 


253 


and 



° Y n 

py — P — Px = 2, 

<Jx 

(9.9) 

respectively. Now, using (9.8) in (9.9), we get 



Py — Px = 2. 

(9.10) 

Similarly, since E(X\Y 

= y) is linear in y, we get 



(J.Y 1 

P cry _ 2 

(9.11) 

and 

vx . 

Px ~ P — Py = 1 , 
a Y 

(9.12) 


Hence, letting (9.10) into (9.11) and simplifying, we get 

2/xjf — yy = 2. (9.13) 


Now adding (9.13) to (9.10), we see that 

Px = 4. 

Remark 9.4. In statistics, a linear regression usually means the conditional 
expectation E (Y/x) is linear in the parameters, but not in x. Therefore, 
E (Y/x) = a + 9x 2 will be a linear model, where as E (Y/x) = ax e is not a 
linear regression model. 

Definition 9.5. Let X and Y be two random variables with joint probability 
density function f(x,y ) and let h(y/x ) is the conditional density of Y given 
X = x. Then the conditional variance 

/ OO 

y 2 h(y/x ) dy 

-OO 

is called the scedastic function of Y on X. The graph of this scedastic function 
of Y on X is known as the scedastic curve of Y on X. 

Scedastic curves and regression curves are used for constructing families 
of bivariate probability density functions with specified marginals. 
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9.4. Review Exercises 

1. Given the regression lines E(Y\X = x ) = x+2 and E(X\Y = y) = 1+1 y, 
what is expected value of Y ? 

2. If the joint density of X and Y is 

! k if — 1 < x < 1; x 2 < y < 1 
0 elsewhere , 


where k is a constant, what is E(Y\X = x) ? 

3. Suppose the joint density of X and Y is defined by 

f 10xy 2 if 0 < a; < y < 1 

f(x,y) = < 


0 


elsewhere. 


What is E (X 2 \Y = y) ? 

4. Let X and Y joint density function 


f 2e 2 G+^) if 0<x<y<oo 

f(x,y) = < 

{0 elsewhere. 

What is the expected value of Y, given X = x, for x > 0 ? 

5. Let X and Y joint density function 

f 8 xy if 0<x<l;0<y<x 

f(x,y) = < 

10 elsewhere. 

What is the regression curve y on x, that is, E (Y/X = x)l 

6. Suppose X and Y are random variables with means fix and fjby, respec¬ 
tively; and E(Y\X = x) = 10 and E(X\Y = y) = — \y + 2. What are 

the values of fix and /ry? 

7. Let X and Y have joint density 

f x 1 (x + y) for 0 < 2y < x < 1 

f(x,y) = < 

0 otherwise. 


What is the conditional expectation of X given Y = y ? 
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8. Let X and Y have joint density 


/ (x, y) = 


cxy 


for 0 < y < 2x ; 1 < x < 5 


v 0 otherwise. 

What is the conditional expectation of Y given X = x ? 

9. Let X and Y have joint density 

f e~ y for y > x > 0 

f(x, y) = l 


v 0 otherwise. 

What is the conditional expectation of X given Y = y ? 

10. Let X and Y have joint density 

( 2 xy for 0 < y < 2x < 2 

f{x,y) = < 


10 otherwise. 

What is the conditional expectation of Y given X = x ? 

11. Let E(Y\X = x) = 2 + 5x, Var(Y\X = x) = 3, and let X have the 
density function 


xe 2 ifO<a:<oo 

0 otherwise. 

What is the mean and variance of random variable Y? 

12. Let E(Y\X = x) = 2x and Var(Y\X = x) = Ax 2 + 3, and let X have 
the density function 


f(x) 


-4= x 2 e x2 for 0 < x < oo 
0 elsewhere. 


What is the variance of Y? 

13. Let X and Y have joint density 

(2 for 0 < y < 1 — x\ and 0 < x < 1 

f(x,y) = < 

10 otherwise. 

What is the conditional variance of Y given X = x ? 
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14. Let X and Y have joint density 


.f(x,y) = 


for 0 < x < y/y < 1 


elsewhere. 


What is the conditional variance of Y given X = x ? 
15. Let X and Y have joint density 


f(x,y) = 


for 1 < x + y < 2; x > 0, y > 0 


elsewhere. 


What is the marginal density of Y ? What is the conditional variance of X 
given Y = | ? 


16. Let X and Y have joint density 


f(x, y) = 


for 0 < y < 2x < 1 


elsewhere. 


What is the conditional variance of Y given X = 0.5 ? 

17. Let the random variable W denote the number of students who take 
business calculus each semester at the University of Louisville. If the random 
variable W has a Poisson distribution with parameter A equal to 300 and the 
probability of each student passing the course is then on an average how 
many students will pass the business calculus? 

18. If the conditional density of Y given X = x is given by 


f{y/x) = 


0 xy{i-xf-y 


if 7/ = 0,1,2, 
otherwise, 


and the marginal density of X is 


h(x) = 


if 0 < x < 1 


otherwise, 


then what is the conditional expectation of Y given the event X = x? 
19. If the joint density of the random variables X and Y is 


,f(x,y) = 


2+(2x - 1)( 2y-l) ifQ<a , y<1 


otherwise, 
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then what is the regression function of Y on XI 

20. If the joint density of the random variables X and Y is 


f{x,y) 


jgmintx,!/} _ ]_] e -(x+y) if 0 < x, y < 00 
0 otherwise, 


then what is the conditional expectation of Y given X = x? 
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Chapter 10 

TRANSFORMATION 

OF 

RANDOM VARIABLES 
AND 

THEIR DISTRIBUTIONS 


In many statistical applications, given the probability distribution of 
a univariate random variable X, one would like to know the probability 
distribution of another univariate random variable Y = <p(X), where </> is 
some known function. For example, if we know the probability distribution 
of the random variable X , we would like know the distribution of Y = ln(A'). 
For univariate random variable X, some commonly used transformed random 
variable Y of X are: Y = X 2 , Y = \X\, Y = y/\X\, Y = ln(AT), Y = 

and Y = (V^) 2 - Similarly for a bivariate random variable (X, Y), 
some of the most common transformations of X and Y are X + Y, XY, y, 
min{X, y}, ma x{X,Y} or \JX 2 + Y 2 . In this chapter, we examine various 
methods for finding the distribution of a transformed univariate or bivariate 
random variable, when transformation and distribution of the variable are 
known. First, we treat the univariate case. Then we treat the bivariate case. 

We begin with an example for univariate discrete random variable. 

Example 10.1. The probability density function of the random variable X 
is shown in the table below. 


X 

—2 

-1 

0 

l 

2 

3 

4 

f(x) 

1 

2 

1 

1 

1 

2 

2 

10 

10 

10 

10 

10 

10 

10 
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What is the probability density function of the random variable Y = X 2 ? 

Answer: The space of the random variable X is Rx = {—2, —1,0,1,2,3,4}. 
Then the space of the random variable Y is Ry = {x 2 | x € Rx}- Thus, 
Ry = {0,1,4, 9,16}. Now we compute the probability density function g(y) 
for y in Ry. 

5(0) = P(Y = 0) = P{X 2 = 0) = P(X = 0)) = 1 

5(1) = P(X = 1) = P(X 2 = 1) = P(X = -1) + P(X = 1) = ^ 

5(4) = P{Y = 4) = P(A 2 = 4) = P(X = - 2 ) + P(A = 2) = A 

5(9) = P(y = 9) = P(A 2 = 9) = P(X = 3) = ^ 

5(16) = P(y = 16) = P(X 2 = 16) = P(X = 4) = A. 

We summarize the distribution of Y in the following table. 


y 

0 

1 

4 

9 

16 

5(5) 

1 

10 

3 

10 

2 

10 

2 

10 

2 

10 



3/10 

• 



2/10 


• • 

• 

1/10 

• 




0 1 

4 9 

16 



2 

Density Function of Y = X 



Example 10.2. The probability density function of the random variable X 
is shown in the table below. 


X 

1 

2 

3 

4 

5 

6 

f ( x ) 

1 

6 

1 

6 

1 

6 

1 

6 

1 

6 

1 

6 


What is the probability density function of the random variable Y = 2X + 1? 
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Answer: The space of the random variable X is Rx = {1, 2, 3,4,5,6}. 
Then the space of the random variable Y is Ry = {2x + 1 | x G Rx}- Thus, 
Ry = {3,5,7,9,11,13}. Next we compute the probability density function 
g(y) for y in Ry. The pdf g(y) is given by 

g( 3) = P(Y = 3) = P(2X + 1 = 3) = P(X = l)) = l 

g( 5) = P(Y = 5) = P{ 2X + 1 = 5) = P(X = 2)) = l 

g( 7) = P(Y = 7) = P(2X + l=7)=P(X = 3)) = l 

g( 9) = P(Y = 9) = P(2X + 1 = 9) = P(X = 4)) = 1 

5(H) = P(Y = 11) = P(2X + 1 = 11) = P(X = 5)) = i 

5(13) = P(F = 13) = P{2X + 1 = 13) = P{X = 6)) = ^. 

We summarize the distribution of Y in the following table. 


y 

3 

5 

7 

9 

u 

13 

g(y ) 

1 

6 

1 

6 

1 

6 

1 

6 

1 

6 

1 

6 


The distribution of X and 2X + 1 are illustrated below. 


1/6 



1/6 



1 2 3 4 5 6 



3 5 7 9 11 13 


Density Function of X 



Density Function of Y = 2X+1 


In Example 10.1, we computed the distribution (that is, the proba¬ 
bility density function) of transformed random variable Y = where 

4>(x) = x 2 . This transformation is not either increasing or decreasing (that 
is, monotonic) in the space, Rx, of the random variable X. Therefore, the 
distribution of Y turn out to be quite different from that of X. In Example 
10.2, the form of distribution of the transform random variable Y = (j>(X), 
where cf>(x) = 2x + 1, is essentially same. This is mainly due to the fact that 
(j>(x) = 2x + 1 is monotonic in Rx- 
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In this chapter, we shall examine the probability density function of trans¬ 
formed random variables by knowing the density functions of the original 
random variables. There are several methods for finding the probability den¬ 
sity function of a transformed random variable. Some of these methods are: 

(1) distribution function method 

(2) transformation method 

(3) convolution method, and 

(4) moment generating function method. 

Among these four methods, the transformation method is the most useful one. 
The convolution method is a special case of this method. The transformation 
method is derived using the distribution function method. 

10.1. Distribution Function Method 

We have seen in chapter six that an easy way to find the probability 
density function of a transformation of continuous random variables is to 
determine its distribution function and then its density function by differen¬ 
tiation. 

Example 10.3. A box is to be constructed so that the height is 4 inches and 
its base is A inches by X inches. If X has a standard normal distribution, 
what is the distribution of the volume of the box? 

Answer: The volume of the box is a random variable, since A is a random 
variable. This random variable V is given by V = AX 2 . To find the density 
function of V, we first determine the form of the distribution function G(v ) 
of V and then we differentiate G(v ) to find the density function of V. The 
distribution function of V is given by 


G(v) =P(V<v) 

= P (4A 2 < v) 

= p(-Iv V < X < l 
\ 2 v “ “ 2 




1 


l- A VV \/27T 
d?" 1 


e'* x \ix 


= 2 


y/2n 


dx 


e 2 ' 


(since the integrand is even). 
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Hence, by the Fundamental Theorem of Calculus, we get 


dG(v) 

9 ( v ) = 17 , 



Example 10.4. If the density function of X is 

f i for —1 < x < 1 

f(x) = l 

[ 0 otherwise, 

what is the probability density function of Y = X 2 ? 

Answer: We first find the cumulative distribution function of Y and then by 
differentiation, we obtain the density of Y. The distribution function G(y) 
of Y is given by 

G(y) = P(Y<y ) 

= P(X 2 <y) 

= P(-Vy<x<Vv) 



= Vv- 
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Hence, the density function of Y is given by 



10.2. Transformation Method for Univariate Case 

The following theorem is the backbone of the transformation method. 

Theorem 10.1. Let X be a continuous random variable with probability 
density function f(x). Let y = T{pc ) be an increasing (or decreasing) function. 
Then the density function of the random variable Y = T(X) is given by 

g{y) = ^ f(w(y)) 

where x = W(y) is the inverse function of T(x). 

Proof: Suppose y = T{x) is an increasing function. The distribution func¬ 
tion G(y) of Y is given by 

G(y) = P(Y<y) 

= P(T(X)<y) 

= P(X<W(y)) 

rW(y) 

= / f(x) dx. 


— OO 
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Then, differentiating we get the density function of Y, which is 


g{y) = 


dG(y) 

dy 



W(y) 


f (x) dx 


= f(W(y)) ^ 
dy 

= f(W(y))~- (since x=W(y)). 


On the other hand, if y = T(x) is a decreasing function, then the distribution 
function of Y is given by 


G(y)=P(Y<y) 

= P(T(X)<y ) 

= P (X > W(y)) (since T(x) is decreasing) 
= 1-P(X <W(y)) 

f W(y) 

= 1 - / f(x) dx. 

J — OO 


As before, differentiating we get the density function of Y, which is 


g(y) = 


dG(y) 

dy 


dy l 1 . 


- ~f(W(y)) 


r W(y) ^ 

f{x) dx 

— oo y 

dW(y) 

dy 


=-ww % 

Hence, combining both the cases, we get 


(since x = W(y)). 


g(y) = 


dx 

dy 


f(W(y)) 


and the proof of the theorem is now complete. 

Example 10.5. Let Z = ■ If X ~ N (/i, cr 2 ), what is the probability 

density function of Z? 
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Answer: 


z = U(x) = 


x — fl 


Hence, the inverse of U is given by 

W(z) = x 

= a z + fi. 

Therefore 


dx 

dz 


= a. 


Hence, by Theorem 10.1, the density of Z is given by 


9{z) = 


dx 

dz 


f(W(y)) 


l 1 [ W(z)-y ) 

= a e 2 v " > 

\J2tuj 2 

1 _ 1 / ZCT + fL — ^ \ 2 

= —= e 2 1 * ’ 

V2tt 

1 _I z * 

= - P 2 Z 



Example 10.6. Let Z = x — M . If X ~ IV (/z, ct 2 ), then show that Z 2 is 
chi-square with one degree of freedom, that Z 2 ~ % 2 (1). 

Answer: 

x = n + oyjy. 

W(y) = n + a y/y, y > 0. 

dx a 

dy 2 y/y' 
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The density of Y is 


g(y) = 


dx 

dy 


2 Vv 

1 


f(W(y)) 


1 


1 fW (. y )-^\ 2 


2 \/y \/2 tt<7 2 


1 

2^2 ny 
1 

2y/2ny 
1 

2y/rr y/2 
1 

2r(i) V2 


i ( '/y (j +m~m ^ 


y 2 e 2 y 


-1 

y 2 e 2y . 


Hence Y ~ x 2 (l). 



Example 10.7. Let Y = — In A. If X ~ UNIF{ 0,1), then what is the 
density function of Y where nonzero? 

Answer: We are given that 

y = T(x) = — In a:. 

Hence, the inverse of y = T(x) is given by 

W(y) = x 


Therefore 
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Hence, by Theorem 10.1, the probability density of Y is given by 


g(y) = 


dx 

dy 


f(W(y )) 


= e~yf(W(y)) 


= e 


-v 


Thus Y ~ EXP{ 1). Hence, if X ~ UNIF( 0,1), then the random variable 
— In X ~ EXP( 1). 


PDF of the Random Variable X 


PDF of the Random Variable Y 

1.4 



1.4 


1.2 




1.2 


0.8 




ii 

0.8 

k 

0.6 




0.6 

\ 

0.4 




0.4 


0.2 

-0.5 t 

I 0.5 1 

1.5 


0.2 

- ■ ~r ; . . . y 






Although all the examples we have in this section involve continuous 
random variables, the transformation method also works for the discrete 
random variables. 


10.3. Transformation Method for Bivariate Case 


In this section, we extend the Theorem 10.2 to the bivariate case and 
present some examples to illustrate the importance of this extension. We 
state this theorem without a proof. 

Theorem 10.2. Let X and Y be two continuous random variables with 
joint density f(x,y). Let U = P(X,Y) and V = Q(X,Y) be functions of X 
and Y. If the functions P(x,y) and Q(x,y) have single valued inverses, say 
X = R(U , V) and Y = S(U, V), then the joint density g(u, v) of U and V is 
given by 

g{u,v) = \J\ f(R(u, v), S(u,v)), 


where J denotes the Jacobian and given by 


J = det 


dx 

dx 

du 

dv 

dy 

dy 

du 

dv 


dx dy dx dy 
du dv dv du 
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Example 10.8. Let X and Y have the joint probability density function 

! 8 xy for 0 < x < y < 1 

0 otherwise. 

What is the joint density of U = y and V = Y ? 


Answer: Since 


we get by solving for X and Y 


U = 


X 


Y 
V = Y 


x = uy = uvy 
Y = V. J 

Hence, the Jacobian of the transformation is given by 

dx dy dx dy 

du dv dv du 

= v ■ 1 — u ■ 0 

= V. 

The joint density function of U and V is 

g(u,v) = \J\ f(R(u, v), S(u,v)) 
= M f(uv,v ) 

= v 8 (uv) v 
= 8 uv 3 . 


Note that, since 


0 < x < y < 1 


we have 


0 < uv < v < 1. 


The last inequalities yield 


0 < uv < v 
0 < v < 1. 
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Therefore, we get 

0 < u < 1 
0 < v < 1. 

Thus, the joint density of U and V is given by 

f 8 uv 3 for 0 < u < 1; 0 < v < 1 

g{u,v) = l 

< 0 otherwise. 




Example 10.9. Let each of the independent random variables X and Y 
have the density function 

! e~ x for 0 < x < oo 

0 otherwise. 

What is the joint density of U = X and V = 2X + 3 Y and the domain on 
which this density is positive? 


Answer: Since 

U = X 1 

V = 2X + 3Y, J 
we get by solving for X and Y 

X = U 

Y = - V —-U. 
3 3 


Hence, the Jacobian of the transformation is given by 


dx dy dx dy 
du dv dv du 



1 

3' 
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The joint density function of U and V is 


g(u,v) = |J| f(R(u, v), S(u,v )) 
1 
3 
1 


,, 1 2 

/ ( U, -v - -u 


= -e~ u p~3 v +3 u 

~ 3 


= t e -m. 


Since 


we get 


0 < x < oo 
0 < y < c«, 


0 < u < oo 
0 < v < oo, 

Further, since v = 2u + 3 y and 3 y > 0, we have 

v > 2 u. 

Hence, the domain of g(u, v ) where nonzero is given by 

0 < 2u < v < oo. 

The joint density g(u, v) of the random variables U and V is given by 


g(u,v) = 


i e -(^) 


for 0 < 2u < v < oo 
otherwise. 


Example 10.10. Let X and Y be independent random variables, each with 
density function 


f(x) 


A e Xx for 0 < x < oo 
0 otherwise, 


where A > 0. Let U = X + 2 Y and V = 2X + Y. What is the joint density 
of U and V? 


Answer: Since 


U = X + 2Y 
V = 2X + Y, 
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we get by solving for X and Y 


x = -\u + \v 

Y = \ u ~\ v ' 

Hence, the Jacobian of the transformation is given by 


dx dy dx dy 
du dv dv du 

m-m 

_ 1 4 

“ 9 ~ 9 
1 

~ _ 3 

The joint density function of U and V is 

9 (u,v) = \J\ f(R(u,v), S(u,v)) 

11 


f(R(u,v)) f(S(u, v)) 


1 


— _ \ e \R{u,v) ^ e \S(u,v) 

3 

__ 1 \2 A[K(u,j;)+S(u,d)] 

“3 

= i A 2 e - A (^). 

O 

Hence, the joint density g(u, v ) of the random variables U and V is given by 
| A 2 e -A ( ) for 0 < u < oo; 0 < v < oo 
0 otherwise. 


g(u,v )= 


Example 10.11. Let X and Y be independent random variables, each with 
density function 

f(x) = e~^ x , —oo < x < oo. 

v27T 

Let U = y and V = Y. What is the joint density of U and E? Also, what 
is the density of U1 
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Answer: Since 


V = Y , 


we get by solving for X and Y 


x = uv\ 

Y = V. J 

Hence, the Jacobian of the transformation is given by 

dx dy dx dy 

du dv dv du 

= v ■ (1) — u ■ (0) 

= V. 

The joint density function of U and V is 

g(u,v) = \J\ f(R(u,v), S(u,v)) 


= M f(R(u,v)) f(S(u, v)) 


= M 


1 1 g -| S 2 (v 


V2k 


e ~\R (,U,v) 




— I.yl 1 c ~b[R 2 (u,v)+S 2 (u,v)] 

27r 


= h JL e -M“ 2 - 2 +- 2 ] 

1 1 2t r 


= H -l e -^ 2 (« 2 +i). 

27T 


Hence, the joint density g(u, v ) of the random variables U and V is given by 


<?(«,«) = M ^e-^ 2 (“ 2+1 ), 


where — oo < u < oo and — oo < v < oo. 
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Next, we want to find the density of U. We can obtain this by finding the 
marginal of U from the joint density of U and V. Hence, the marginal gi(u) 
of U is given by 


/ OO 

g(u, v) dv 

-OO 


—OO 
OO 


—OO 
0 


2tt 6 


1 -iA^dv 


= j -uj| e-M“ 2+1 ) dv + e -5 t ’ 2 (“ 2+1 ) 


dv 


1 


2 

; _^-*- 2 (- 2 +i)l 

2-7T 

w 

Lu 2 + i J 

+ 

— ( 

1) 

2 e~i v2 ( u2 


2t r V 


u 2 T 1 

1 

1 

i i 

1 

2-7T 

u 2 + 1 

27t u 2 + 1 


1 o 


J 0 


1 


7r (u 2 + 1) 

Thus U ~ CAU{ 1). 


Remark 10.1. If A' and Y are independent and standard normal random 
variables, then the quotient y is always a Cauchy random variable. However, 
the converse of this is not true. For example, if X and Y are independent 
and each have the same density function 


f(x) = 


V2 


x 2 

1 + X 4 ’ 


—OO < X < OO, 


then it can be shown that the random variable y is a Cauchy random vari¬ 
able. Laha (1959) and Kotlarski (1960) have given a complete description 
of the family of all probability density function / such that the quotient y 
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follows the standard Cauchy distribution whenever A and F are independent 
and identically distributed random variables with common density /. 

Example 10.12. Let X have a Poisson distribution with mean A. Find a 
transformation T(x ) so that Var (T(A)) is free of A, for large values of A. 

Answer: We expand the function T(x) by Taylor’s series about A. Then, 
neglecting the higher orders terms for large values of A, we get 

T(x) = T{\) + {x-X)T'(\) + . 

where T'{ A) represents derivative of T(x) at x = A. Now, we compute the 
variance of T(A). 

Var ( T(X )) = Var (T(A) + (X - A) T'( A) + • • •) 

= Var (T(A)) + Var ((X — A) T\ A)) 

= 0+ [T'(A)] 2 Var(X-X) 

= [T\ A)] 2 Var(X) 

= [T'( A)] 2 A. 

We want Var(T(X )) to be free of A for large A. Therefore, we have 

[T'( A)] 2 A = k, 


where k is a constant. From this, we get 


T'{ A) 


vT 


where c = vfc. Solving this differential equation, we get 

Tix) = c S7t ix 

= 2c a/A. 


Hence, the transformation T(a:) = 2ci/x will free Var(T(X)) of A if the 
random variable X ~ POI( A). 

Example 10.13. Let A ~ POI( Ai) and F ~ POI{ A 2 ). What is the 
probability density function of A + F if A and F are independent? 


Answer: Let us denote U = X + Y and V = X. First of all, we find the 
joint density of U and V and then summing the joint density we determine 
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the marginal of U which is the density function of X + Y‘l Now writing X 
and Y in terms of U and V, we get 

X = V ) 

Y = U — X = U — V. J 

Hence, the Jacobian of the transformation is given by 

dx dy dx dy 
du dv dv du 

= (o)(—1) — (1)(1) 

= - 1 . 



This example tells us that if X ~ POI(\±) and Y ~ POI( A 2 ) and they are 
independent, then X + Y ~ POI(Xi + A 2 ). 
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Theorem 10.3. Let the joint density of the random variables X and Y be 
f(x,y). Then probability density functions of X + Y, XY , and ^ are given 
by 


/ oo 

f(u,v - u) du 

-OO 

hxv(v) = [ At / (u,~) du 
J -oo M V uJ 

f>00 

hx(v)= / \u\ f (u,vu) du, 


respectively. 


Proof: Let U = X and V = X + Y. So that X = R{U, V) = U, and 
Y = S(U,V) = V — U. Hence, the Jacobian of the transformation is given 
by 

dx dy dx dy 
du dv dv du 

The joint density function of U and V is 


g{u,v) = \J\ f(R(u,v), S(u,v)) 
= f(R(u,v), S(u,v)) 

= f (u, v u). 


Hence, the marginal density of V — X + Y is given by 

/ OO 

f(u,v- u) du. 

-OO 

Similarly, one can obtain the other two density functions. This completes 
the proof. 

In addition, if the random variables X and Y in Theorem 10.3 are in¬ 
dependent and have the probability density functions f(x) and g(y) respec¬ 
tively, then we have 


hx+y{z) 


g{y ) / 0 - y) dy 



\y\g{y ) / (zy) dy. 



Probability and Mathematical Statistics 


277 


Each of the following figures shows how the distribution of the random 
variable X + Y is obtained from the joint distribution of (A, Y). 
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Example 10.14. Roll an unbiased die twice. If X denotes the outcome 
in the first roll and Y denotes the outcome in the second roll, what is the 
distribution of the random variable Z = max{A, Y }? 

Answer: The space of X is Rx = {1,2,3,4, 5, 6}. Similarly, the space of Y 
is Ry = {1,2,3,4,5,6}. Hence the space of the random variable (X, Y) is 
Rx x Ry. The following table shows the distribution of (X, Y). 
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The space of the random variable Z = max{X, Y} is Rz = {1,2,3,4,5,6}. 
Thus Z = 1 only if (A, Y) = (1,1). Hence P(Z = 1) = T. Similarly, Z = 2 
only if ( X,Y) = (1,2), (2,2) or (2,1). Hence, P(Z = 2) = Proceeding in 
a similar manner, we get the distribution of Z which is summarized in the 
table below. 
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z 

1 

2 

3 

4 

5 

6 

h(z) 

1 

3 

5 

7 

9 

11 

36 

36 

36 

36 

36 

36 


In this example, the random variable Z may be described as the best out of 
two rolls. Note that the probability density of Z can also be stated as 

h(z) = ^ 1 , for 2 e {1,2,3,4, 5,6}. 

10.4. Convolution Method for Sums of Random Variables 

In this section, we illustrate how convolution technique can be used in 
finding the distribution of the sum of random variables when they are inde¬ 
pendent. This convolution technique does not work if the random variables 
are not independent. 

Definition 10.1. Let / and g be two real valued functions. The convolution 
of / and g, denoted by / * g, is defined as 

/ OO 

f{z-y)g{y)dy 

-oo 

/ oo 

g(z - x) f(x) dx. 

-OO 

Hence from this definition it is clear that / * g = g * /■ 

Let X and Y be two independent random variables with probability 
density functions f(x) and g(y). Then by Theorem 10.3, we get 


/ OO 

f(z - y) g(y) dy. 

-oo 


Thus, this result shows that the density of the random variable Z = X + Y 
is the convolution of the density of X with the density of Y. 

Example 10.15. What is the probability density of the sum of two inde¬ 
pendent random variables, each of which is uniformly distributed over the 
interval [0, 1]? 

Answer: Let Z = X + Y, where X ~ UNIF{ 0,1) and Y ~ UNIF{ 0,1). 
Hence, the density function f(x) of the random variable X is given by 

f 1 for 0 < x < 1 

f(x) = < 

10 otherwise. 
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Similarly, the density function g{y) of Y is given by 


g{y) = 


.0 


for 0 < y < 1 


otherwise. 


Since X and Y are independent, the density function of Z can be obtained 
by the method of convolution. Since, the sum z = x + y is between 0 and 2, 
we consider two cases. First, suppose 0 < . z < 1, then 

H z ) = if *9) 0) 

/ OO 

f(z - x) g(x) dx 

-OO 

= / f{z - x) g(x) dx 
Jo 

= / f(z-x)g(x)dx+ / f(z — x) g(x) dx 

Jo J Z 

= / f(z — x) g(x) dx + 0 (since f(z — x) = 0 between 2 and 1) 

J O 


= / dx 

Jo 

= z. 


Similarly, if 1 < 2 < 2, then 

H z ) = (f*g) (z) 

/ OO 

f(z - x) g(x) dx 

-OO 

r*l 


= / f{z~ x) g(x) dx 
Jo 

= / f(z - x) g(x) dx + / f(z-x)g(x)dx 
Jo jz -1 

= 0 + / f(z — x) g(x) dx (since f(z — x) = 0 between 0 and z — 1) 
J z -1 


dx 


J z -1 

= 2 — z. 
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Thus, the density function of Z 


( 0 


h(z) 


z 


X + Y is given by 
for —oo < z < 0 
for 0 < 2 < 1 


2 — z for 1 < z < 2 

.0 for 2 < z < oo . 


PDF of the Random Variable X 



1 A 

Distribution of Z = X+Y 

1 .2 

f t Simpson 

0.8 

0.6 

0.4 

0.2 

/ \ Distribution 

-0.5 C 

0.5 1 1.5 2 2.5 


The graph of this density function looks like a tent and it is called a tent func¬ 
tion. However, in literature, this density function is known as the Simpson’s 
distribution. 

Example 10.16. What is the probability density of the sum of two inde¬ 
pendent random variables, each of which is gamma with parameter a = 1 
and 9 = 11 

Answer: Let Z = X + Y, where X ~ GAM( 1,1) and Y ~ GAM(1,1). 
Hence, the density function f(x) of the random variable X is given by 

! e~ x for 0 < x < oo 

0 otherwise. 

Similarly, the density function g(y) of Y is given by 


g(y) = 



for 0 < y < oo 
otherwise. 


Since X and Y are independent, the density function of Z can be obtained 
by the method of convolution. Notice that the sum z = x + y is between 0 
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and oo, and 0 < x < z. Hence, the density function of Z is given by 


h{z) 


(f *9) (z) 

/ OO 

f(z - x) g(x) dx 

-OO 

/*oo 

/ f(z - x) g(x) dx 
Jo 

[ e~ {z ~ x) e~ x dx 

Jo 

[ e~ z+x e~ x dx 

Jo 

[ e~ z dx 


z e 


1 


r( 2 ) i 2 


2 2 " 1 e-f. 


Hence Z ~ GAM{ 1,2). Thus, if X ~ GAM{1,1) and Y ~ GAM{ 1,1), 
then X + y ~ GAM (1, 2), that X + T is a gamma with a = 2 and 0 = 1. 
Recall that a gamma random variable with a = 1 is known as an exponential 
random variable with parameter 6. Thus, in view of the above example, we 
see that the sum of two independent exponential random variables is not 
necessarily an exponential variable. 

Example 10.17. What is the probability density of the sum of two inde¬ 
pendent random variables, each of which is standard normal? 

Answer: Let Z = X + Y, where X ~ iV(0,1) and Y ~ iV(0,1). Hence, the 
density function f{x) of the random variable X is given by 


f(x) = 



e 


x 2 


Similarly, the density function g(y) of Y is given by 


g{y) = 



e 


Since X and Y are independent, the density function of Z can be obtained 
by the method of convolution. Notice that the sum z = x + y is between —oo 
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and oo. Hence, the density function of Z is given by 
K z ) = if *9) ( z ) 


f(z - x) g{x) dx 


J —OO 

U 

1 

27r 
1 
27T 
1 
27T 
1 

\ZAtt 

1 

7T 


-c z-*y 


2 dx 


= — e 


= — e 




5~( x -%) 2 dx 


L-d —00 
r oo 1 


-^e-( x -i) 2 dx 

\/7T 


— e~ w dw 


where w = x — 


i=5i 

y/2 ) 


The integral in the brackets equals to one, since the integrand is the normal 
density function with mean g = 0 and variance cr 2 = |. Hence sum of two 
standard normal random variables is again a normal random variable with 
mean zero and variance 2. 

Example 10.18. What is the probability density of the sum of two inde¬ 
pendent random variables, each of which is Cauchy? 

Answer: Let Z = X + Y, where X ~ N( 0,1) and Y ~ N( 0,1). Hence, the 
density function f(x) of the random variable X and Y are is given by 

1 ... 1 


/ 0 ) = 


and g{y ) = 


7T (1 + x 2 ) tt(1 + y 2 )’ 

respectively. Since X and Y are independent, the density function of Z can 
be obtained by the method of convolution. Notice that the sum z = x + y is 
between —00 and 00 . Hence, the density function of Z is given by 

h ( z ) = (f*9) (z) 


f(z - x) g{x) dx 

1 


1 


7 r 


'-00 7T(1 + (z - x) 2 ) 7T (1 + X 2 ) 

1 f°° 1 1 

~~2 


dx 


Loo 1 + (z - x) 2 1 + X 2 


dx. 
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To integrate the above integral, we decompose the integrand using partial 
fraction decomposition. Hence 

1 1 2 Ax + B 2 C (z — x) + D 

1 + (z — X ) 2 1 + X 2 1 + x 2 1 + (z — x) 2 


where 

\ — 1 - c 
z ( 4 + z 2 ) C 

Now integration yields 


and 


B = 


= D. 


I- oo 1 + {z-x) 2 1 + x' 


dx 


1 


7T 2 Z 2 (4 + Z 2 ) 


Z 11 1 

[0 


1 + x 2 


7t 2 z 2 (4 + z 2 ) [ \1 + (z — x) 2 

1 


+ z 2 tan 1 x — z 2 tan 1 (z — x) 


Z 2 7T + Z 2 7T 1 


2 

7T (4 + Z 2 ) ' 

Hence the sum of two independent Cauchy random variables is not a Cauchy 
random variable. 


If X ~ CAU( 0) and Y ~ Cb4iy(0), then it can be easily shown using 
Example 10.18 that the random variable Z = X+ 2 Y is again Cauchy, that is 
Z ~ CAU( 0). This is a remarkable property of the Cauchy distribution. 

So far we have considered the convolution of two continuous independent 
random variables. However, the concept can be modified to the case when 
the random variables are discrete. 


Let X and Y be two discrete random variables both taking on values 
that are integers. Let Z = X + Y be the sum of the two random variables. 
Hence Z takes values on the set of integers. Suppose that X = n where n is 
some integer. Then Z = z if and only if Y = z — n. Thus the events (Z = z) 
is the union of the pair wise disjoint events (X = n) and (Y = z — n) where 
runs over the integers. The cdf H(z) of Z can be obtained as follows: 

OO 

P{Z = z) = p ( x = n ) p ( r = z-n) 

n =—oo 

which is 

OO 

h(z)= /( n ) d( z ~ n )> 


n=— oo 
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where F(x) and G(y) are the cdf of X and Y, respectively. 

Definition 10.2. Let X and Y be two independent integer-valued discrete 
random variables, with pdfs f(x) and g(y) respectively. Then the convolution 
of f(x) and g(y) is the cdf h = f * g given by 

OO 

h(m)= /( n ) 9{ m ~ n), 

n=— oo 

for m = — oo,..., — 2, —1,0,1, 2, ....oo. The function h(z) is the pdf of the 
discrete random variable Z = X + Y. 

Example 10.19. Let each of the random variable X and Y represents the 
outcomes of a six-sided die. What is the cumulative density function of the 
sum of X and Y? 

Answer: Since the range of X as well as Y is {1, 2, 3,4,5, 6}, the range of 
Z = X + Y is Rz = {2,3,4,..., 11,12}. The pdf of Z is given by 

m = = \ = l 6 

M3) = /(l) 9( 2) + m 5(1) = \-\ + \-\ = Iq 

M4) = /(l) 5(3) + M2) 5(2) + /(3) g( 1) = \ \ \ \ \ \ ^ 

Continuing in this manner we obtain h( 5) = ^, h( 6) = h(7) = 

h ( 8 ) = H 9 ) = M' ft ( 10 ) = /l ( 11 ) = and /l ( 12 ) = Puttin S 

these into one expression we have 



It is easy to note that the convolution operation is commutative as well 
as associative. Using the associativity of the convolution operation one can 
compute the pdf of the random variable S n = X± + X 2 + ■ ■ ■ + X n , where 
X \, X- 2 ,..., X n are random variables each having the same pdf f(x). Then 
the pdf of Si is /( x). Since S n = S„_i + X n and the pdf of X n is f(x), the 
pdf of S n can be obtained by induction. 
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10.5. Moment Generating Function Method 

We know that if X and Y are independent random variables, then 

M x +y(t) = M x (t) My(t). 

This result can be used to find the distribution of the sum X + Y. Like the 
convolution method, this method can be used in finding the distribution of 
X + Y if X and Y are independent random variables. We briefly illustrate 
the method using the following example. 

Example 10.20. Let X ~ POI( Ai) and Y ~ POI{ A 2 ). What is the 
probability density function of X + Y if X and Y are independent? 

Answer: Since, X ~ POI(Xi) and Y ~ POI{ A 2 ), we get 

Afjc(i) = e Al(e ‘- 1) 

and 

Afy(t) = e A2 (e ‘~ 1) . 

Further, since X and Y are independent, we have 
M x+ y(t) = M x (t) My(t ) 

= g A i (e t_ !) g A 2 (e*-'i) 

— gAi (e f —1)+A2 (e* —1) 

= g(Ai+A2)(e‘—1) 

that is, X+Y ~ PO/(Ai + A 2 ). Hence the density function h(z) of Z = A+F 
is given by 

f e ~ ( y ^ (Ar + A 2 ) z for « = 0,1,2,3,... 

&(*) = < 

l 0 otherwise. 

Compare this example to Example 10.13. You will see that moment method 
has a definite advantage over the convolution method. However, if you use the 
moment method in Example 10.15, then you will have problem identifying 
the form of the density function of the random variable X + Y. Thus, it 
is difficult to say which method always works. Most of the time we pick a 
particular method based on the type of problem at hand. 
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Example 10.21. What is the probability density function of the sum of 
two independent random variable, each of which is gamma with parameters 
9 and al 

Answer: Let A and Y be two independent gamma random variables with 
parameters 6 and a, that is X ~ GAM(9, a) and Y ~ GAM(9,a). From 
Theorem 6.3, the moment generating functions of X and Y are obtained as 
Mx(t ) = (1 — 9)~ a and My(t) = (1 — 9)~ a , respectively. Since, X and Y 
are independent, we have 

M x +y(t) = M x (t) M Y (t) 

= (1 — 9)~ a (1 — 9)~ a 
= (1 — 9)~ 2a . 

Thus X + Y has a moment generating function of a gamma random variable 
with parameters 9 and 2a. Therefore 

X + Y ~GAM(9,2a). 


/ 0 ) = 


- e~ x 
2 e 


10.6. Review Exercises 

1. Let A be a continuous random variable with density function 

for 0 < x < oo 

f 0 otherwise. 

If Y = e ~ 2X , then what is the density function of Y where nonzero? 

2. Suppose that A is a random variable with density function 

for 0 < x < 2 


/ 0 ) = 


3 2 
8 


0 


otherwise. 


Let Y = mA 2 , where m is a fixed positive number. What is the density 
function of Y where nonzero? 


3. Let A be a continuous random variable with density function 

2 e~ 2x for x > 0 

0 otherwise 


/ 0 ) = 


and let Y = e x . What is the density function g(y) of Y where nonzero? 
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4. What is the probability density of the sum of two independent random 
variables, each of which is uniformly distributed over the interval [—2, 2]? 


5. Let X and Y be random variables with joint density function 

! e~ x for 0 < x < oo; 0 < y < 1 

0 elsewhere . 

If Z = X + 2 Y, then what is the joint density of X and Z where nonzero? 

6. Let J be a continuous random variable with density function 

f \ for 1 < x < 2 

fix) = X 

y 0 elsewhere. 

If Y = y/X, then what is the density function of Y for 1 < y < y/2 ? 


7. What is the probability density of the sum of two independent random 
variables, each of which has the density function given by 


fix) = 


10-x 

50 


0 


for 0 < x < 10 
elsewhere? 


8. What is the probability density of the sum of two independent random 
variables, each of which has the density function given by 


fix) = 


for a < x < oo 
elsewhere? 


9. Roll an unbiased die 3 times. If U denotes the outcome in the first roll, V 
denotes the outcome in the second roll, and W denotes the outcome of the 
third roll, what is the distribution of the random variable Z = max{t/, V, W}? 

10 . The probability density of V, the velocity of a gas molecule, by Maxwell- 
Boltzmann law is given by 


f(v) 



Q —h v 


for 0 < v < oo 


0 


otherwise, 


where h is the Plank’s constant. If m represents the mass of a gas molecule, 
then what is the probability density of the kinetic energy Z = \ mV 2 ? 
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11. If the random variables X and Y have the joint density 


/ 0 , y) = 


Y x ior 1 < x + y < 2, x > 0, y > 0 


0 otherwise, 
what is the joint density of U = 2X + 3F and V = 4X + Y? 

12. If the random variables X and Y have the joint density 

| x ior 1 < x + y < 2, x > 0, y > 0 
0 otherwise, 

what is the density of 

13. Let X and Y have the joint probability density function 

( jq xy 2 for 0 < x < y < 2 


f(x, y) = 


f(x,y) = 


0 


elsewhere. 


What is the joint density function of U = 3X — 2 Y and V = X + 2Y where 
it is nonzero? 


14. Let X and Y have the joint probability density function 

' 4x for 0 < x < yjy < 1 
. 0 elsewhere. 


f(x, y) = 


What is the joint density function of U = 5X — 2Y and V = 3X + 2 Y where 
it is nonzero? 


15. Let X and Y have the joint probability density function 

( 4x for 0 < x < y/y < 1 

f(x,y) = < 

10 elsewhere. 

What is the density function of X — Y ? 

16. Let X and Y have the joint probability density function 

( 4x for 0 < x < yjy < 1 

0 elsewhere. 


f(x, y) = 
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What is the density function of y 1 

17. Let X and Y have the joint probability density function 

( 4x for 0 < x < ^Jy < 1 
fix, y ) = < 

10 elsewhere. 

What is the density function of XY ? 

18. Let X and Y have the joint probability density function 

f A xy 2 for 0 < x < y < 2 

f(x,y) = < 

0 elsewhere. 

What is the density function of ^? 

19. If X an uniform random variable on the interval [0, 2] and Y is an uniform 
random variable on the interval [0,3], then what is the joint probability 
density function of X + Y if they are independent? 

20. What is the probability density function of the sum of two independent 
random variable, each of which is binomial with parameters n and pi 

21. What is the probability density function of the sum of two independent 
random variable, each of which is exponential with mean 01 

22. What is the probability density function of the average of two indepen¬ 
dent random variable, each of which is Cauchy with parameter 0 = 0? 

23. What is the probability density function of the average of two indepen¬ 
dent random variable, each of which is normal with mean y and variance 
a 2 ? 

24. Both roots of the quadratic equation x 2 + ax +13 = 0 can take all values 
from — 1 to +1 with equal probabilities. What are the probability density 
functions of the coefficients a and /31 

25. If A, B, C are independent random variables uniformly distributed on 
the interval from zero to one, then what is the probability that the quadratic 
equation Ax 2 + Bx + C = 0 has real solutions? 

26. The price of a stock on a given trading day changes according to the 

distribution /(—1) = ?, /(0) = /(1) = and /(2) = |. Find the 

distribution for the change in stock price after two (independent) trading 
days. 
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Chapter 11 

SOME 

SPECIAL DISCRETE 
BIVARIATE DISTRIBUTIONS 


In this chapter, we shall examine some bivariate discrete probability den¬ 
sity functions. Ever since the first statistical use of the bivariate normal dis¬ 
tribution (which will be treated in Chapter 12) by Galton and Dickson in 
1886, attempts have been made to develop families of bivariate distributions 
to describe non-normal variations. In many textbooks, only the bivariate 
normal distribution is treated. This is partly due to the dominant role the 
bivariate normal distribution has played in statistical theory. Recently, how¬ 
ever, other bivariate distributions have started appearing in probability mod¬ 
els and statistical sampling problems. This chapter will focus on some well 
known bivariate discrete distributions whose marginal distributions are well- 
known univariate distributions. The book of K.V. Mardia gives an excellent 
exposition on various bivariate distributions. 

11.1. Bivariate Bernoulli Distribution 

We define a bivariate Bernoulli random variable by specifying the form 
of the joint probability distribution. 

Definition 11.1. A discrete bivariate random variable (A, Y) is said to have 
the bivariate Bernoulli distribution if its joint probability density is of the 
form 


\ yH i- x - y y. PiP 2 (I-P 1 -P 2 ) 


l-x-y 


if x, y = 0,1 


f(x,y) = 


0 


otherwise, 
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where 0 < p±, p 2 , Pi + P 2 < 1 and x + y < 1. We denote a bivariate Bernoulli 
random variable by writing (X,Y) ~ BER(pi,p 2 ). 

In the following theorem, we present the expected values and the vari¬ 
ances of X and Y, the covariance between X and Y, and their joint moment 
generating function. Recall that the joint moment generating function of X 
and Y is defined as M(s,t ) := E (e sX+tY ). 

Theorem 11.1. Let (X, Y) ~ BER(pi,p 2 ), where p\ and P 2 are parame¬ 
ters. Then 

E(X) = p 1 
E(Y) = p 2 

Var(X) = p i (1 - pi) 

V ar(Y) =p 2 (l-p 2 ) 

Cov(X,Y ) = - PlP2 

M(s,t) = 1- Pl -P2+ Pie s +p 2 e t . 

Proof: First, we derive the joint moment generating function of X and Y and 
then establish the rest of the results from it. The joint moment generating 
function of X and Y is given by 

M(s, t) = E (e sX+tY ) 

1 1 

a;—0 y —0 

= /(0,0) + /(1,0) e s + /(0,1) e‘ + /(1,1) e t+s 
= 1 - pi - pi + p\e s + p 2 e t + 0 e t+s 
= 1-Pi -P2 +Pie s +p 2 e*. 


The expected value of X is given by 


d 


( 0 , 0 ) 


= — (1 - pi -p2 +Pie 
= Pi e 1 ( 0 , 0 ) 


P2e 


( 0 , 0 ) 


= p 1- 
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Similarly, the expected value of Y is given by 


d 


( 0 , 0 ) 


= -Q (1 - Pi - Pi + Pie s + p 2 e t ) 


= P2 e 


= Pl- 


(0,0) 


( 0 , 0 ) 


The product moment of X and Y is 
d 2 M I 


E(XY) = 


dtds 

a 2 

dtds 
d 


(0,0) 

(I-P1-P2+ Pie s + p 2 e*) 


( 0 , 0 ) 


= 5 ^' 

= 0 . 


( 0 , 0 ) 


Therefore the covariance of X and Y is 


Cov(X, Y) = E(XY) - E{X) E(Y) = - Pl p 2 
Similarly, it can be shown that 

E( X 2 ) = pi and E(Y 2 ) = p 2 . 

Thus, we have 

Var(X) = E(X 2 ) - E(X) 2 = Pl -p 2 = Pl (l- Pl ) 

and 

Var(Y) = E(Y 2 ) - E(Y) 2 =p 2 -p 2 =p 2 (l^ p 2 ). 

This completes the proof of the theorem. 

The next theorem presents some information regarding the conditional 
distributions f{x/y) and f(y/x). 
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Theorem 11.2. Let (X,Y) ~ BER(pi,p 2 ), where p\ and P 2 are parame¬ 
ters. Then the conditional distributions f(y/x) and f(x/y) are also Bernoulli 
and 

E(Y/x) = 

1 - Pi 

E(X/y) = 

1 - P‘2 

Var(Y M _ a(1 ~ ( ?;^ (1 ~ l) 

rar(X/„) = Pl( 1 ~ ( P 1 ‘:” ) l (1 ~ ! ' ) - 


Proof: Notice that 


/(y/z) = 


/0*h y) 
fl(x) 
f(x,y) 


y=o 


f(x,y) 

f{x, 0) + f(x, 1) 


x = 0,1; y = 0,1; 0 < x + y < 1. 


Hence 


and 


/(I/O) = 


/(0,1) 

/(0,0) + /(0,l) 


P2 

1 -P 1 -P 2 +P 2 


P 2 

1 -Pi 


/( 1 / 1 ) = 


/(1,1) 


/(1,0) 

0 


/( 1 , 1 ) 


Pi 


0 


= 0. 


Now we compute the conditional expectation E[Y/x ) for x = 0,1. Hence 


E(Y/x = 0) = '£yf(y/0) 
y =0 

= /(i/o) 

1 “Pi 
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and 

E(Y/x = 1) = /(1/1) = 0. 

Merging these together, we have 

E(Y/x) = V2 ^ 1 ~ ^ x = 0,1. 

1 - Pi 

Similarly, we compute 

l 

E(Y 2 / x = o) = J2y 2 f(y/Q) 

y =o 

= /(i/o) 

, = P2 
1 “Pi 

and 

£(^ 2 /* = 1) = /(1/1) = o. 

Therefore 

VariY/x = 0) = E(Y 2 /x = 0) - E(Y/x = 0) 2 

= P2 _ ( P2 V 
l-Pi \1-Pl) 

= P2(f ~Pl) p\ 

(1-Pl) 2 

= P2 (1 - Pi - Pi) 

(1-Pl) 2 

and 

Var(Y/x = 1) = 0. 

Merging these together, we have 

VariY/x) = Ml-n-riHl-x) x = 0,1. 

The conditional expectation E(X/y) and the conditional variance Var(X/y) 
can be obtained in a similar manner. We leave their derivations to the reader. 

11.2. Bivariate Binomial Distribution 

The bivariate binomial random variable is defined by specifying the form 
of the joint probability distribution. 
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Definition 11.2. A discrete bivariate random variable (A', Y) is said to 
have the bivariate binomial distribution with parameters n,pi,p2 if its joint 
probability density is of the form 

f x\y\ (n-x-v)\ PiP V 2 (1 - Pi ~ P 2 ) n ~ x ~ v , if x, y = 0, 1, ..., n 

f{x,y) = < 

[ 0 otherwise, 

where 0 < p\, p 2, P1+P2 < 1, x+y < n and n is a positive integer. We denote 
a bivariate binomial random variable by writing (X, Y) ~ BIN (n,pi,p2)- 

Bivariate binomial distribution is also known as trinomial distribution. 
It will be shown in the proof of Theorem 11.4 that the marginal distributions 
of X and Y are BIN{n,p\) and BIN(n,p2 ), respectively. 

The following two examples illustrate the applicability of bivariate bino¬ 
mial distribution. 


Example 11.1. In the city of Louisville on a Friday night, radio station A 
has 50 percent listeners, radio station B has 30 percent listeners, and radio 
station C has 20 percent listeners. What is the probability that among 8 
listeners in the city of Louisville, randomly chosen on a Friday night, 5 will 
be listening to station A, 2 will be listening to station B , and 1 will be 
listening to station C? 

Answer: Let X denote the number listeners that listen to station A, and 
Y denote the listeners that listen to station B . Then the joint distribution 
of X and Y is bivariate binomial with n = 8, p\ = and P2 = jq- The 
probability that among 8 listeners in the city of Louisville, randomly chosen 
on a Friday night, 5 will be listening to station A, 2 will be listening to station 
B, and 1 will be listening to station C is given by 


P(X = 5, T = 2) = /(5,2) 


nl 


x\y\(n — x — y)\ 


P1P2 0 --pi -piY 


8 ! 


5! 2! 1! V 10 
= 0.0945. 


Example 11.2. A certain game involves rolling a fair die and watching the 
numbers of rolls of 4 and 5. What is the probability that in 10 rolls of the 
die one 4 and three 5 will be observed? 
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Answer: Let X denote the number of 4 and Y denote the number of 5. 
Then the joint distribution of X and Y is bivariate binomial with n = 10, 
Pi = f !., P‘2 = g and 1 — pi — p 2 = Hence the probability that in 10 rolls 
of the die one 4 and three 5 will be observed is 


P(X = 5,Y = 2) = /(1,3) 


n! 

x\ y\ (n — x — y )! 


P\p\ (1-Pl-P2) n - X ~ y 


io! m 1 

“ 1! 3! (10 - 1 - 3)! \6 J 

10 ! flV 

~ 1! 3! (10 — 1 — 3)! \6 ) 
573440 
“ 10077696 
= 0.0569. 



10 - 1-3 


Using transformation method discussed in chapter 10, it can be shown 
that if Xi , X 2 and X 3 are independent binomial random variables, then the 
joint distribution of the random variables 

X = Xi + X 2 and Y = Xi + X 3 


is bivariate binomial. This approach is known as trivariate reduction tech¬ 
nique for constructing bivariate distribution. 

To establish the next theorem, we need a generalization of the binomial 
theorem which was treated in Chapter 1. The following result generalizes the 
binomial theorem and can be called trinomial theorem. Similar to the proof 
of binomial theorem, one can establish 


(a + b + c) n = EE 

x—0 y =0 



a x b y c n ~ x - y , 


where 0 < x + y < n and 



n! 

x\ y\ (n — x — y)\ 


In the following theorem, we present the expected values of X and Y, 
their variances, the covariance between X and Y, and the joint moment 
generating function. 
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Theorem 11.3. Let (X, Y) ~ BIN (n,pi,p 2 ), where n, p\ and P 2 are 
parameters. Then 

E{X) = npi 
E(Y) = np2 
Var(X) = npi (1 - pi) 

Var(Y) = np 2 (1 - P 2 ) 

Cov(X,Y) = —npip 2 

M (. s,t)= (I-P 1 -P 2 + Pie s + P 2 e t ) n ■ 

Proof: First, we find the joint moment generating function of X and Y. The 
moment generating function M(s,t ) is given by 
M(s, t) = E (e sX+tY ) 

n n 

= EE eSX+t V(*>y) 

x=0 y— 0 




x—0 y —0 


= EE _ TT7-- - Y\ ( eS P^ X ( et Pi) V ( 1 ~Pi-P2) n 

/ —' / —'x\y\(n — x — y)\ v 7 

x =o y=o y v 

= (l — pi — P 2 + P\e s + P 2 ^ t ) " (by trinomial theorem). 
The expected value of X is given by 

*(*) - ^ 


O / e f \ n 

= — (1 - Pi - p 2 + Pie + p 2 e ) 

as ( 0 , 0 ) 

( "I o 4- \ 71 1 o 

1 - Pi - P 2 + Pie* + p 2 e ) Pi e 

= np i. 

Similarly, the expected value of Y is given by 

S(F) = w 

UL (r\ n\ 


= (1-Pi - P 2 + Pie s + p 2 e t Y 


1 ( 0 , 0 ) 

t\n-l t 


/ 1 s t \ n ~ 

= n [1-Pi-P2+Pie +p 2 e ) p 2 e' 


= np 2 . 
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The product moment of X and Y is 


E(XY) = 


d 2 M 


dt ds 

(0,0) 


d 2 

(l - pi -p 2 + P\e s 


dt ds 

+ p 2 e , 

d / 

(l ~Pi ~P 2 +Pie l 


dtV 1 

f + P2e‘ 


n 


1 ( 0 , 0 ) 

n— 1 


Pi e 


s 


= n(n- l)pip 2 - 

Therefore the covariance of X and Y is 


(0,0) 


Cov(X , Y) = E(XY) - E(X) E{Y) = n(n - l) Pl p 2 - n 2 Pl p 2 = ~n PlP2 . 

Similarly, it can be shown that 

E(X 2 ) = n(n — \)p\ + np\ and E(Y 2 ) = n(n — l)p 2 + np 2 . 
Thus, we have 

Var(X) = E{X 2 ) - E{X ) 2 

= n(n — l)pl + np 2 — n 2 p\ 

= npi (1 - pi) 

and similarly 


Var{Y) = E(Y 2 ) - E(Y ) 2 = np 2 (1 -p 2 ). 

This completes the proof of the theorem. 

The following results are needed for the next theorem and they can be 
established using binomial theorem discussed in chapter 1. For any real 
numbers a and 6, we have 

m , , 

( m )ci y b m - y = ma(a + &) m ” 1 (11.1) 

y =o ' 

and 

m / \ 

Yv 2 ( m )a v b m ~ v = ma (ma + b) (a + b) m ~ 2 (11.2) 

y =0 ' 

where m is a positive integer. 
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Example 11.3. If X equals the number of ones and Y equals the number of 
twos and threes when a pair of fair dice are rolled, then what is the correlation 
coefficient of X and Y? 


Answer: The joint density of X and Y is bivariate binomial and is given by 


/0, y) = 


2 ! (I 

x\ y\ (2 — x — y)\ \6 


2 —x—y 


0 < x + y < 2 , 


where x and y are nonnegative integers. By Theorem 11.3, we have 
Var(X) —Ml-W) =2 \ ( 1 - 5 ) = ^. 


2 / 2 \ 16 
Var{Y) = np 2 (1 -p 2 ) = 2 - f 1 - -) = —, 


and 


Therefore 


Cov(X,Y) = -np 1 p 2 = -2 \ \ 

6 6 36 


Corr(X, Y) 


Cov(X, Y) 
y/Var(X) Var(Y) 
4 

4'/l0 

-0.3162. 


The next theorem presents some information regarding the conditional 
distributions f(x/y) and f(y/x). 


Theorem 11.4. Let (X, Y) ~ BIN (n,pi,p 2 ), where n, p\ and p 2 are 
parameters. Then the conditional distributions f(y/x) and f(x/y ) are also 
binomial and 


E(Y/x) 


P 2 (n - x) 
1 -Pi 


E(X,y) = 

1 - P ‘2 


Var{Y/x) = 
Var(X/y ) = 


P 2 (1 - Pi - P 2 ) (n 
(1 ~Pi) 2 

Pi (1 -Pi -P 2 ) (n 
( 1-P2) 2 


x) 


y ) 
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Proof: Since f{y/x) = first we find the marginal density of X. The 

marginal density /i (x) of X is given by 


AO) = E 


n\ 


y-r'o x\y\(n-x-y)\ 


P 1 P 2 (1 -Pi -P 2) 7 


n\pf 


E 


(n-x)\ y 


x\ (n — x)\ y\(n — x — y)\ 


Pl (1 “Pi -P2 +P2) r 

Pi (i -piT~ x . 


pl{i-pi- P2 ) n -*-y 


(by binomial theorem) 


In order to compute the conditional expectations, we need the conditional 
densities of fix, y). The conditional density of Y given X = x is 


f{y/x) = 


f(x, y) 
AO) 


f(x, y) 


(")pi O-Pi)' 


(n — x)\ y 


(n-x-y)\y\ 


&(l-Pi-P 2 ) n -*- v (l-Pi) x 


= (l-pif-”( n E)P! a-Pl-P2) n - X ~ V 


Hence the conditional expectation of Y given the event X = x is 

n—x / \ 

E(Y/x) = ^2/(1 -Pi) x ~ n ( U X ) p\ (1 - Pi -P2) n ~ x ~ v 

U V y ) 


= (i -Pi) x " n E 


v=o 


Tl X \ y , 

y 1 y ) p 2 ( 1 Pi P2) 


= (1 - Pi) x ~ n P 2 (n - x) (1 - Pi)"-"- 1 
_ P 2 (n - x) 

1 - Pi 


Next, we find the conditional variance of Y given event X = x. For this 
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we need the conditional expectation E (Y 2 /x), which is given by 


E(Y 2 /x) = J2y 2 f(x,y) 
y =o 

n—x / \ 

= x; y 2 (i - p,)*-" r ~ x )pi a - pi - P2) n ~ x ~ v 

y =o \ y / 

n—x / \ 

= (i- Pir -r>J2y 2 ( n ~ x )pUi-Pi-P2) n - X ~ v 

= (1 - Pi) x ~ n P 2 (n - *) (1 - Pi) n -"- 2 [(n - a:)p 2 + 1 - pi - pa] 
_p 2 (n- x ) [(n - a;)p 2 + 1 - pi - p 2 ] 

(1-Pi) 2 

Hence, the conditional variance of Y given X = x is 

Var(Y/x) = E ( Y 2 /x) - E(Y/xf 

_ P 2 (n - x) [{n - x)p 2 + 1 - pi - p 2 ] _ f p 2 (n - x) \ 2 
(i-pi) 2 V i-pi ) 

_ p 2 (1 - Pi -p 2 ) (n - x) 

(1-Pi) 2 


Similarly, one can establish 

rpfY, s Pi(n-y) , , v / , pi(l-pi-p 2 )(n-y) 

= —,- and Var(X/y) = ---^-. 

1 -P 2 ( 1 -P 2 2 


This completes the proof of the theorem. 

Note that f(y/x) in the above theorem is a univariate binomial probability 
density function. To see this observe that 




n — x 

y 

n — x 

y 


p 2 (1 -Pi -p 2 ) 
P 2 


1 - Pi 


1 - 


P 2 


l-Pi 


n—x—y 


Hence, f(y/x) is a probability density function of a binomial random variable 
with parameters n — x and • 

The marginal density f 2 (y) of Y can be obtained similarly as 

f2(y) = i^jpl (! — P2 ) n ~ v , 
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where y = 0,1,n. The form of these densities show that the marginals of 
bivariate binomial distribution are again binomial. 

Example 11.4. Let IT equal the weight of soap in a 1-kilogram box that is 
distributed in India. Suppose P(W < 1) = 0.02 and P(W > 1.072) = 0.08. 
Call a box of soap light, good, or heavy depending on whether IT < 1, 
1 < IT < 1.072, or IT > 1.072, respectively. In a random sample of 50 boxes, 
let X equal the number of light boxes and Y the number of good boxes. 
What are the regression and scedastic curves of Y on X? 


Answer: The joint probability density function of X and Y is given by 
501 

,(x ' v) = swso-'s-i,)! P ’ vl (1 “ Pl - 0 £ 1 + v £ 50 - 


where x and y are nonnegative integers. Hence, (X, Y) ~ BIN{n,p\,p 2 ), 
where n = 50, p\ = 0.02 and p 2 = 0.90. The regression curve of Y on X is 
given by 


E(Y/x) 


p 2 (n - x) 
1 - Pi 


0.9 (50-x) 
1 - 0.02 


45 

49 


(50 — x). 


The scedastic curve of Y on X is the conditional variance of Y given X = x 
and it equal to 


Var(Y/x) 


p 2 (1 - Pi - p 2 ) (n 
(1-Pi) 2 
0.9 0.08 (50 -x) 
( 1 - 0 . 02) 2 

180 (50 -*). 


2401 


x) 


Note that if n = 1, then bivariate binomial distribution reduces to bi¬ 
variate Bernoulli distribution. 

11.3. Bivariate Geometric Distribution 

Recall that if the random variable X denotes the trial number on which 
first success occurs, then X is univariate geometric. The probability density 
function of an univariate geometric variable is 

f(x) =p x ~ 1 (1 -p), x = 1 , 2 ,3,..., oo, 



Probability and Mathematical Statistics 


303 


where p is the probability of failure in a single Bernoulli trial. This univari¬ 
ate geometric distribution can be generalized to the bivariate case. Guldberg 
(1934) introduced the bivariate geometric distribution and Lundberg (1940) 
first used it in connection with problems of accident proneness. This distri¬ 
bution has found many applications in various statistical methods. 

Definition 11.3. A discrete bivariate random variable (A, Y) is said to 
have the bivariate geometric distribution with parameters p\ and P 2 if its 
joint probability density is of the form 

f (! -Pi -P 2 ) , if X,y = 0,1 ,...,00 

f(x, y) = < 

[ 0 otherwise, 

where 0 < Pi, P 2 , Pi + P 2 < 1- We denote a bivariate geometric random 
variable by writing (X,Y) ~ GEO (p±,p 2 ). 

Example 11.5. Motor vehicles arriving at an intersection can turn right 
or left or continue straight ahead. In a study of traffic patterns at this 
intersection over a long period of time, engineers have noted that 40 percents 
of the motor vehicles turn left, 25 percents turn right, and the remainder 
continue straight ahead. For the next ten cars entering the intersection, 
what is the probability that 5 cars will turn left, 4 cars will turn right, and 
the last car will go straight ahead? 

Answer: Let X denote the number of cars turning left and Y denote the 
number of cars turning right. Since, the last car will go straight ahead, 
the joint distribution of X and Y is geometric with parameters p± = 0.4, 
P 2 = 0.25 and p 3 = 1 — pi — p 2 = 0.35. For the next ten cars entering the 
intersection, the probability that 5 cars will turn left, 4 cars will turn right, 
and the last car will go straight ahead is given by 

P(X = 5, y = 4) = / (5,4) 

{x + y)\ x , 

= ~^yT VlV2 (1_Pl_P2 ) 

= ( 5 5 |~ 4 ^ ! (0.4) 5 (0.25) 4 (1 - 0.4 - 0.25) 

= ^ (0 - 4)5 (0 - 25)4(0 - 35) 

= 0.00677. 
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The following technical result is essential for proving the following theo¬ 
rem. If a and b are positive real numbers with 0 < a + b < 1, then 


OO OO 


EE 

a;=0 y —0 


(x + y)\ 
x\ y\ 


a x b v = 



(11.3) 


In the following theorem, we present the expected values and the vari¬ 
ances of X and Y, the covariance between X and Y, and the moment gen¬ 
erating function. 


Theorem 11.5. Let (V, Y) ~ GEO (jp\,p 2 ), where pi and p 2 are parame¬ 
ters. Then 


E(X) = 
E(Y) = 
Var(X) = 

Var(Y) = 
Cov{X , Y) = 
M(s, t ) = 


Pi 

1 - p-| - P 2 
P 2 

1 - Pi - P-2 
Pi (1 ~P 2 ) 

(1 “Pi -P2) 2 
P 2 (1 ~Pl) 

(1 - Pi -P 2) 2 
P 1 P 2 

(1 ~Pi -P 2) 2 
1 ~ Pi ~ P2 

1 — pie s — P2C 1 


Proof: We only find the joint moment generating function M ( s, t ) of X and 
Y and leave proof of the rests to the reader of this book. The joint moment 
generating function M ( s, t) is given by 


M(s,t) =E(e sX+tY ) 


= EE e sx+t yf(x,y) 

x=0 y —0 

_ ^2 ^2 e sx+tv 

x=0y —0 


x\ y\ 


P x iP V 2 (1 - Pi -P 2 ) 


= (1-P1-P2) EE 


x—0y —0 


{x + y)\ 
x\ y\ 


[pie s ) x (p 2 e^ y 


(1 - Pi -P 2 ) 

1 — p\e s — p 2 e t 


(by (11.3)). 
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The next theorem presents some information regarding the conditional 
densities f(x/y) and f(y/x). 

Theorem 11.6. Let (X,Y) ~ GEO (pi,p 2 ), where p\ and P 2 are parame¬ 
ters. Then the conditional distributions f(y/x ) and f(x/y) are also geomet¬ 
rical and 

e(y M = 5hl±£) 

1 - P‘2 

e(x M = 

1 - Pi 

v "W'» = 

Proof: Again, as before, we first find the conditional probability density of 
Y given the event X = x. The marginal density fi(x) is given by 



(1 - Pi ~P2)Pi 
(1 -p 2 ) X+1 


(by (11.4)). 
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Therefore the conditional density of Y given the event X = x is 

The conditional expectation of Y given X = x is 


E(Y/x) 


12 yf(y / x ) 

y=o 


12 y 


y =o 


(x + y)\ 
x\ y\ 


pl(l-p 2 f +1 


P2 (1 + X) 
0 --P 2 ) 


(by (11.5)). 


Similarly, one can show that 


E(X/y) = 


Pi (1 + 2/) 
(1-Pi) ' 


To compute the conditional variance of Y given the event that X = x, first 
we have to find E ( Y 2 /x ), which is given by 


E (Y 2 /x) 


12 y 2 f(y/ x ) 

y =0 

fyfy# rfa-K) 

y =0 

P2 (1 + X) 


tc+l 


x\ y\ 


(1 -P 2 ) 


2 [p 2 (l + x) + l] (by (11.6)). 


Therefore 

Var ( Y 2 /x ) = E ( Y 2 /x ) - E(Y/x ) 2 

P 2 (1 T x) \t M ^ fp 2 (l + x)\ 2 

_P2{ 1 + X) 

~ (1 P 2} 2 


The rest of the moments can be determined in a similar manner. The proof 
of the theorem is now complete. 
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11.4. Bivariate Negative Binomial Distribution 

The univariate negative binomial distribution can be generalized to the 
bivariate case. Guldberg (1934) introduced this distribution and Lundberg 
(1940) first used it in connection with problems of accident proneness. Arbous 
and Kerrich (1951) arrived at this distribution by mixing parameters of the 
bivariate Poisson distribution. 


Definition 11.4. A discrete bivariate random variable (A', Y) is said to have 
the bivariate negative binomial distribution with parameters k,pi and P 2 if 
its joint probability density is of the form 

f ( xwtk- Tjr Pi Pi 0--P1 - Pit . if x,y = 0,1,...,oo 
f{x,y) = < 

[ 0 otherwise, 

where 0 < Pi, p 2 , Pi + P 2 < 1 and k is a nonzero positive integer. We 
denote a bivariate negative binomial random variable by writing (A, Y) ~ 
NBIN(k, Pl ,p 2 ). 

Example 11.6. An experiment consists of selecting a marble at random and 
with replacement from a box containing 10 white marbles, 15 black marbles 
and 5 green marbles. What is the probability that it takes exactly 11 trials 
to get 5 white, 3 black and the third green marbles at the 11 th trial? 


Answer: Let A denote the number of white marbles and Y denote the 
number of black marbles. The joint distribution of A and Y is bivariate 
negative binomial with parameters p\ = P 2 = 5 , and k = 3. Hence the 
probability that it takes exactly 11 trials to get 5 white, 3 black and the third 
green marbles at the 11 th trial is 


P(X = 5,y = 3) = /(5,3) 


(x + y + k — 1)! 
x\ y\ (k — 1)! 


pIpI (i 


Pl -P2) k 


= ( 5 5 } 3 !( t 3 1 ) 1 )! (°- 33 ) 5 (°- 5 ) 3 t 1 - °- 33 - °- 5 ) 3 

= 5 !^ 2 ! (0 - 33)5 (0 - 5)3 (0 - 17)3 
= 0.0000503. 


The negative binomial theorem which was treated in chapter 5 can be 
generalized to 


OO OO 


EE 

x=0 y —0 


(x + y + k — 1)! 
x\ y\ (k — 1)! 


Pi P 2 = 


1 

O--P 1 -P2) k 


(11.7) 
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In the following theorem, we present the expected values and the vari¬ 
ances of X and Y, the covariance between X and Y, and the moment gen¬ 
erating function. 


Theorem 11.7. Let (V, Y) ~ NBIN (k,pi,p 2 ), where k, p-\ and p 2 are 
parameters. Then 


E{X) 
E(Y) 
Var(X) 
V ar(Y) 
Cov(X, Y) 
M(s,t) 


kpi 

1 - Pi - Pi 
kpi 

1 - P i - Pi 
kp 1 (1 -p 2 ) 

(1 - Pi ~Vi) 2 
kpi (1 ~Pi) 

(1 - Pi - Pi) 2 
kpi p 2 

(1 - Pi - Pi) 2 
{l-Pi -p 2 ) k 
(1 -pie s -p 2 e‘) fc 


Proof: We only find the joint moment generating function M(s,t) of the 
random variables X and Y and leave the rests to the reader. The joint 
moment generating function is given by 

M(s,t) = E(e sX+tY ) 


oo oo 


= EE eSX ' +ty /( ;c >y) 

x=0 y —0 
oo oo 

= EE 

tc=0 y —0 


e sx+ty (x E+ fc ~! )! pf Pi (1 - Pi - Pif 


k ( x + V + k — 1)! , s sx /t „ \V 


x\ y\ [k — 1)! 

o OO 

= (i-pi-p 2 ) fc ee 

(1 - Pi - Pi) 


x=0y —0 
\k 


x\ y\ (k — 1 )! 


(« e s pi) x (e 4 p 2 ) 1 


(l-pie s - p 2 e *) 


(by (11.7)). 


This completes the proof of the theorem. 

To establish the next theorem, we need the following two results. If a is 
a positive real constant in the interval ( 0 , 1 ), then 

(x + y + k - 1 )! 1 (x + k) 

E -—- —a y = 


^ x\y\(k- 1 )! 


(1 - a) x+k ’ 


( 11 . 8 ) 
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T,y 

y=0 


(x + y + k — 1)! 
a;! y\ (/c — 1)! 


a (x + k) 

(1 - a) x+k+1 ’ 


(11.9) 


and 


e v 2 * t,i )! = ,: {x HL p+<*+*)»i ■ (mo 


v=0 


x\ y\ (A: — 1)! 


(1-a) 


The next theorem presents some information regarding the conditional 
densities f{x/y) and f(y/x). 

Theorem 11.8. Let (X, Y) ~ NBIN (k,pi,p 2 ), where p± and P 2 are pa¬ 
rameters. Then the conditional densities f(y/x ) and f(x/y) are also negative 
binomial and 

p 2 (k + x) 


E{Y/x) = 
E(X/y) = 
Var(Y/x) = 
V ar(X/y) = 


1 - P ‘2 
Pi (k + y ) 

1 - Pi 
p 2 (.k + x) 
(I-P2) 2 
Pi {k + y) 
(l-Pi) 2 ' 


Proof: First, we find the marginal density of X. The marginal density fi(x) 
is given by 


AO) = f( x >y) 

y =o 

= (x + y + k- 1)! y 

^ x\ y\ (k — 1)! llP2 

(, x ( X + y + & ~ !)• y 

= (1 - I ’^ K) a si „!(*-!)! ** 


= 0--P1-P2) Pi 


1 


(1 -p 2 ) X+k 

The conditional density of Y given the event X = x is 

f(x,y) 


(by (11.8)). 


f(y/x) = 


fi(x) 

(x + y + k — 1)! 
a:! y\ (.k — 1)! 


P V 2 (1 - P 2 ) 


x-\-k 
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The conditional expectation E{Y/x) is given by 


E = ± £ « p , (1 _ K )>+* 


x—0 y —0 




x—0 y —0 


x\ y\ (k — 1)! 


= < 1 -»)' + *(TT^S« (by (119)) 

= P 2 (x + k) 

0--P2) 

The conditional expectation E ( Y 2 /x ) can be computed as follows 

E(YVx) = ±±,/ vl (1 - »)'+* 


x—0 y—0 


00 00 




x—0 y—0 


x\ y\ (k — 1)! 


= (1 - P 2 ) x+k [! + (* + k)V 2 ] (by (11.10)) 


p 2 (x + k) 
(I-P 2) 2 


(1 — p 2 ) X+k+2 
[1 + (x + k ) p 2 \ ■ 


The conditional variance of Y given X = x is 


Var {Y/x) = E (Y 2 /x) - E ( Y/x f 


P 2 (x + k) 
(1 -P 2) 2 
p 2 (x + k) 


[1 + (x + k) p 2 ] 


(I-P 2 ) 2 ' 


[P 2 (x + k) \ 2 
\ 0 -P 2 )) 


The conditional expected value E(X/y) and conditional variance Var(X/y ) 
can be computed in a similar way. This completes the proof. 

Note that if k = 1, then bivariate negative binomial distribution reduces 
to bivariate geometric distribution. 

11.5. Bivariate Hypergeometric Distribution 

The univariate hypergeometric distribution can be generalized to the bi¬ 
variate case. Isserlis (1914) introduced this distribution and Pearson (1924) 
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gave various properties of this distribution. Pearson also fitted this distri¬ 
bution to an observed data of the number of cards of a certain suit in two 
hands at whist. 


Definition 11.5. A discrete bivariate random variable (A', Y) is said to have 
the bivariate hypergeometric distribution with parameters r, ri \, ri 2 . 713 if its 
joint probability distribution is of the form 


f(x,y) 


(™l'l ( n 3 ) 

\ * I \ u ) \r-x-yj if r 7/ — D 1 r 

^n 1 +n 2 + n 3 \ ) 11 X, y U, ±, / 

0 otherwise, 


where x < ni, y < 112 , r — x — y < 11,3 and r is a positive integer less than or 
equal to ni+n^ + n^. We denote a bivariate hypergeometric random variable 
by writing (A, Y) ~ HYP ( r , ni,ri 2 , 713 ). 


Example 11.7. A panel of prospective jurors includes 6 african american, 4 
asian american and 9 white american. If the selection is random, what is the 
probability that a jury will consists of 4 african american, 3 asian american 
and 5 white american? 


Answer: Here n\ = 7, 712 = 3 and 713 = 9 so that n = 19. A total of 12 
jurors will be selected so that r = 12. In this example x = 4, y = 3 and 
r — x — y = 5. Hence the probability that a jury will consists of 4 african 
american, 3 asian american and 5 white american is 


/(4,3) = 


( 12 ) 


4410 

50388 


= 0.0875. 


Example 11.8. Among 25 silver dollars struck in 1903 there are 15 from 
the Philadelphia mint, 7 from the New Orleans mint, and 3 from the San 
Francisco. If 5 of these silver dollars are picked at random, what is the 
probability of getting 4 from the Philadelphia mint and 1 from the New 
Orleans? 


Answer: Here n = 25, r = 5 and n\ = 15, ri 2 = 7, 1 x 3 = 3. The the 
probability of getting 4 from the Philadelphia mint and 1 from the New 
Orleans is 


/(4,1) 



9555 

53130 


0.1798. 


In the following theorem, we present the expected values and the vari¬ 
ances of X and F, and the covariance between X and Y. 
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Theorem 11.9. Let (X,T) ~ HYP (r,ni, n 2 ,n 3 ), where r, m, n 2 and n 3 
are parameters. Then 


S(X) = 

rn i 


Ui + n 2 + n 3 


£(D = 

rn 2 


ni + n 2 + n 3 


Vor(X) = 

rni (n 2 + n 3 ) / 

' ni + n 2 + ?r 3 - r 

(ni + n 2 + n 3 ) 2 \ 

v m + n 2 + n 3 - 1 

Var(Y) = 

rn 2 (ni + n 3 ) / 

'ni + n 2 + n 3 - r 

(ni + n 2 + n 3 ) 2 \ 

K Tl\ H - 722 H” ^3 — 1 

Cov(X, Y) = 

rni n2 

/ 72i + 712 + 72 3 — 

(ni + n 2 + n 3 ) 2 

\n± + n 2 + n 3 - 


Proof: We find only the mean and variance of X. The mean and variance 
of Y can be found in a similar manner. The covariance of X and Y will be 
left to the reader as an exercise. To find the expected value of X , we need 
the marginal density fi(x) of X. The marginal of X is given by 


A 0*0 = A^.y) 

y =o 


r—x /ni\ /n 2 \ ( n 3 ) 

E V x ) V y ) \r—x—y / 
(ni+n 2 +n 3 \ 
y= 0 V r ) 


ft) 


r—x 


/ni+n 2 +n 3 \ 

V r ) y =0 


ft) 


^ni+n 2 +ra3^ 


n 2 + n 3 
r — x 


n 3 

r — x — y 


(by Theorem 1.3) 


This shows that X ~ HYP(m, n 2 + n 3 , r). Hence, by Theorem 5.7, we get 


£(V) = 


r n i 

ni + n 2 + ?r 3 ’ 


and 

(ni + n 2 + n 3 ) 2 \ni + n 2 + n 3 - 1/ 

Similarly, the random variable Y ~ HYP[n 2 , ?ri + n 3 , r). Hence, again by 
Theorem 5.7, we get 

rn 2 

n\ + n 2 + n 3 ’ 


£(T) = 
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and 


Var{Y) = 


r n 2 (ni + n 3 ) 
(ni + n 2 + n 3 ) 2 


f ni + n 2 + n 3 — r 
\ ni + ?r 2 + n 3 - 1 


The next theorem presents some information regarding the conditional 
densities f(x/y) and f(y/x). 

Theorem 11.10. Let (X, Y) ~ HYP (r, ni, n 2 , n 3 ), where r, n\, n 2 and n 3 
are parameters. Then the conditional distributions f(y/x ) and f(x/y) are 
also hypergeometric and 


E(Y/x) = 
E(X/y) = 
Var(Y/x) = 
V ar(X/y) = 


n 2 (r - x) 
n 2 + n 3 
»i (r - y) 

ni + n 3 
n 2 n 3 

n 2 + n 3 — 1 
nin 3 

ni + n 3 — 1 


ni + n 2 + n 3 — a; 
ri2 + «3 

ni + n 2 + n 3 - y 
ni + n 3 


/ g-m \ 
\«2 + n 3 / 
/ y - n 2 \ 
\ni + n 3 ) 


Proof: To find E(Y/x), we need the conditional density f{y/x ) of Y given 
the event X = x. The conditional density f{y/x ) is given by 


f{y/x) = 


f(x,y) 

h( x ) 


( n2 )( " 3 ) 

V y / \r—x—y) 

fn 2 +n 3 \ 

V r—x ) 


Hence, the random variable Y given X = x is a hypergeometric random 
variable with parameters n 2 , n 3 , and r — x, that is 


Y/x ~ HYP(n 2 , n 3 , r — x). 


Hence, by Theorem 5.7, we get 


E(YM = " 2(r ~ l) 
n 2 + n 3 


and 


Var(Y/x ) 


n 2 n 3 

/ ni + n 2 + n 3 - x\ 

/ x — n\ \ 

U 2 + «3 — 1 

V + «3 / 

V «2 + n 3 ) 
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Similarly, one can find E(X/y) and Var(X/y). The proof of the theorem is 
now complete. 

11.6. Bivariate Poisson Distribution 


The univariate Poisson distribution can be generalized to the bivariate 
case. In 1934, Campbell, first derived this distribution. However, in 1944, 
Aitken gave the explicit formula for the bivariate Poisson distribution func¬ 
tion. In 1964, Holgate also arrived at the bivariate Poisson distribution by 
deriving the joint distribution of X = X\ + A3 and Y = X 2 + A3, where 
X \, A 2 , A 3 are independent Poisson random variables. Unlike the previous 
bivariate distributions, the conditional distributions of bivariate Poisson dis¬ 
tribution are not Poisson. In fact, Seshadri and Patil (1964), indicated that 
no bivariate distribution exists having both marginal and conditional distri¬ 
butions of Poisson form. 


Definition 11.6. A discrete bivariate random variable (A, Y) is said to 
have the bivariate Poisson distribution with parameters Ai, A 2 , A 3 if its joint 
probability density is of the form 


f(x,y) 


e (-A 1 -A 2 +A 3 ) ( Al _A 3 )^ (A 2 -A 3 ) y 
x\ y\ 




for x, y — 0 , 1 ,..., 00 


0 


otherwise, 


where 

x (r) y (r) A3 

(Ai — A 3 ) r (A2 — A 3 ) r r! 

with 

x ^ := x(x — 1) • • • (x — r + 1), 

and Ai > A 3 > 0, A 2 > A 3 > 0 are parameters. We denote a bivariate Poisson 
random variable by writing (A, Y) ~ POI (Ai, A 2 , A 3 ). 

In the following theorem, we present the expected values and the vari¬ 
ances of X and Y, the covariance between A and Y and the joint moment 
generating function. 

Theorem 11.11. Let (A ,Y) ~ POI (Ai, A2, A3), where Ai, A2 and A3 are 


min(a;, 

i/j(x,y):= ^ 
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parameters. Then 

E(X) = Ax 
E(Y) = X 2 
Var(X) = Ai 
Var(Y ) = A 2 
Cov(X,Y) = A 3 

AI ( S = e~ ^ 2_ A3+Aie s + A2e*+A3e s_t " t 

The next theorem presents some special characteristics of the conditional 
densities f(x/y) and f(y/x). 

Theorem 11 . 12 . Let {X,Y) ~ POI (Ai, A 2 , A3), where Ai, A 2 and A3 are 
parameters. Then 

E(Y/x) = A 2 - A 3 + (Q x 
E(X/y ) = Ax - A 3 + y 

Var(Y/x) = A 2 - A 3 + ^ % 

Var{X/y) = X 1 - A 3 + A 3 ) ^j ^ 


11.7. Review Exercises 

1 . A box contains 10 white marbles, 15 black marbles and 5 green marbles. 
If 10 marbles are selected at random and without replacement, what is the 
probability that 5 are white, 3 are black and 2 are green? 

2 . An urn contains 3 red balls, 2 green balls and 1 yellow ball. Three balls 
are selected at random and without replacement from the urn. What is the 
probability that at least 1 color is not drawn? 

3. An urn contains 4 red balls, 8 green balls and 2 yellow balls. Five balls 
are randomly selected, without replacement, from the urn. What is the 
probability that 1 red ball, 2 green balls, and 2 yellow balls will be selected? 

4. From a group of three Republicans, two Democrats, and one Independent, 
a committee of two people is to be randomly selected. If X denotes the 
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number of Republicans and Y the number of Democrats on the committee, 
then what is the variance of Y given that X = xl 

5. If X equals the number of ones and Y the number of twos and threes 
when a four fair dice are rolled, then what is the conditional variance of X 
and Y = 1? 

6 . Motor vehicles arriving at an intersection can turn right or left or continue 
straight ahead. In a study of traffic patterns at this intersection over a long 
period of time, engineers have noted that 40 percents of the motor vehicles 
turn left, 25 percents turn right, and the remainder continue straight ahead. 
For the next five cars entering the intersection, what is the probability that 
at least one turn right? 

7. Among a large number of applicants for a certain position, 60 percents 
have only a high school education, 30 percents have some college training, 
and 10 percents have completed a college degree. If 5 applicants are randomly 
selected to be interviewed, what is the probability that at least one will have 
completed a college degree? 

8 . In a population of 200 students who have just completed a first course 
in calculus, 50 have earned A’s, 80 B' s and remaining earned F' s. A sample 
of size 25 is taken at random and without replacement from this population. 
What is the probability that 10 students have A’s, 12 students have B' s and 
3 students have F' s ? 

9. If X equals the number of ones and Y the number of twos and threes 
when a four fair dice are rolled, then what is the correlation coefficient of X 
and y? 

10 . If the joint moment generating function of X and Y is M(s,t) = 

k ^ ■ 7 _ e s_ 2e t ) i then what is the value of the constant kl What is the corre¬ 
lation coefficient between X and Y1 

11 . A die with 1 painted on three sides, 2 painted on two sides, and 3 painted 
on one side is rolled 15 times. What is the probability that we will get eight 
l’s, six 2’s and a 3 on the last roll? 

12. The output of a machine is graded excellent 80 percents of time, good 15 
percents of time, and defective 5 percents of time. What is the probability 
that a random sample of size 15 has 10 excellent, 3 good, and 2 defective 
items? 
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13. An industrial product is graded by a machine excellent 80 percents of 
time, good 15 percents of time, and defective 5 percents of time. A random 
sample of 15 items is graded. What is the probability that machine will grade 
10 excellent, 3 good, and 2 defective of which one being the last one graded? 

14. If (A, Y) ~ HYP(ni,n 2 ,ri 3 ,r), then what is the covariance of the 
random variables X and Y? 
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Chapter 12 

SOME 

SPECIAL CONTINUOUS 
BIVARIATE DISTRIBUTIONS 


In this chapter, we study some well known continuous bivariate probabil¬ 
ity density functions. First, we present the natural extensions of univariate 
probability density functions that were treated in chapter 6. Then we present 
some other bivariate distributions that have been reported in the literature. 
The bivariate normal distribution has been treated in most textbooks be¬ 
cause of its dominant role in the statistical theory. The other continuous 
bivariate distributions rarely treated in any textbooks. It is in this textbook, 
well known bivariate distributions have been treated for the first time. The 
monograph of K.V. Mardia gives an excellent exposition on various bivariate 
distributions. We begin this chapter with the bivariate uniform distribution. 

12.1. Bivariate Uniform Distribution 

In this section, we study Morgenstern bivariate uniform distribution in 
detail. The marginals of Morgenstern bivariate uniform distribution are uni¬ 
form. In this sense, it is an extension of univariate uniform distribution. 
Other bivariate uniform distributions will be pointed out without any in 
depth treatment. 

In 1956, Morgenstern introduced a one-parameter family of bivariate 
distributions whose univariate marginal are uniform distributions by the fol¬ 
lowing formula 


f(x,y ) = fi{x)f 2 {y) {l+ a[2F 1 {x) - 1] [2F 2 (y) - 1]), 
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where a £ [—1,1] is a parameter. If one assumes The cdf F,(x) = x and 
the pdf fi(x) = 1 (i = 1,2), then we arrive at the Morgenstern uniform 
distribution on the unit square. The joint probability density function f(x, y) 
of the Morgenstern uniform distribution on the unit square is given by 

f(x, y) = 1 + a (2x — 1) (2y — 1), 0<x,y<l, -1 < a < 1. 


Next, we define the Morgenstern uniform distribution on an arbitrary 
rectangle [a, b] x [c,d\. 

Definition 12.1. A continuous bivariate random variable (A, Y) is said to 
have the bivariate uniform distribution on the rectangle [a, 6 ] x [c, cl] if its 
joint probability density function is of the form 


f(x,y) 


( 6 — a) ( d—c ) 


for x £ [a,b] y £ [c, d] 


0 


otherwise , 


where a is an apriori chosen parameter in [—1,1]. We denote a Morgenstern 
bivariate uniform random variable on a rectangle [a, b] x [c, cl] by writing 
(X,Y) ~ UNIF(a,b, c,d,a). 

The following figures show the graph and the equi-density curves of 
f(x,y) on unit square with a = 0.5. 


The equi-density curves c = f(x,y) 

0.8 

0.6 

P'vl 


0.4 

0.2 




0 0.2 0.4 0.6 0.8 1 




In the following theorem, we present the expected values, the variances 
of the random variables X and Y, and the covariance between X and Y . 

Theorem 12.1. Let (X, Y) ~ UNIFM(a,b,c,d,a), where a,b,c,d and a 
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are parameters. Then 


E(X) 
E{Y) 
Var(X) 
V ar(Y) 


b + a 
2 

d + c 
2 

( b-a ) 2 
12 

(d-c ) 2 


12 


1 


Cov(X,Y)=-a(b-a) ( d-c ). 


Proof: First, we determine the marginal density of X which is given by 

fi( x ) = J f( x i y) dy 

r d i + «(^-!)(- 


2y-2c _ 1 
d—c 


(b — a) (d — c) 


dy 


b — a 


Thus, the marginal density of X is uniform on the interval from a to b. That 
is X ~ UNIF(a, b). Hence by Theorem 6.1, we have 

E{X) = ^ and Var(X) = 

Similarly, one can show that Y ~ UNIF(c,d) and therefore by Theorem 6.1 

E(Y) = ^ and Var(Y ) = 

The product moment of X and Y is 

<*b pd 


E(XY)= f j xy f(x, y) dx dy 
J a J c 


* f i + «( 2 fs?-i)te-i) 

/ X V - ^77 -^ 73-^3 -' 


(b — a) (d— c ) 


dx dy 


a (b - a) (d - c) + ^ (b + a) (d + c). 
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Thus, the covariance of X and Y is 


Cov(X , Y) = E(XY) - E(X) E(Y) 

= ^ a(h - a) (d - g) + ^ (b+a) (d +g) — (b+a) (d+ g) 

= 4^(6-a) ( d-c ). 


This completes the proof of the theorem. 

In the next theorem, we states some information related to the condi¬ 
tional densities f(y/x) and f(x/y). 

Theorem 12.2. Let (X,Y) ~ UNIF(a,b,c,d,a), where a,b,c,d and a are 
parameters. Then 


. d + c a , o A , l2x (2x —2a 

E{Y/x) -^r + W^) (c + ) 


E{X/y) = 


b + a a 


2 6 (6 - a) 


(a 2 + 4a6 + 6 2 ) f 


2 \ ( 

d — c 


-1 


Var(Y/x) = — (f- —[a 2 (a + b) (Ax — a — b) + 3(6 — a) 2 — 4a 2 x 2 l 
36 \b — aJ L J 


Var(X/y) = P [« 2 (c + d) (4y - c - d) + 3(d - c) 2 - 4a V] . 


Proof: First, we determine the conditional density function f(y/x). Recall 
that fi(x) = Hence, 


f{y/x) 


1 

d — c 


1 + a 


(“- 1 ) 


/ 2 y - 2c 
V d-c 
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The conditional expectation E(Y/x) is given by 


E{Y/x) = V f(y/x) dy 


d — c 


_ , 2x — 2a 

1 + a [ —-1 

b — a 


2 y — 2c 
d — c 


- 1 


dy 


d+ c 
2 

d + c 
2 


a / 2x — 2a 
6 (d — c ) 2 \ b — a 



c 3 + 3 dc 2 


— 3 cd 2 


a / 2 a: — 2 a 
6 (d — c) \ b — a 



Similarly, the conditional expectation E (Y 2 /x) is given by 


E (: Y 2 /x ) = J y 2 f(y/x ) dy 


d — c 


, 2 a; — 2 a 

1 + a [ —-1 

b — a 


r d 2 — c 2 a (2x — 2 a 


2 y — 2 c 
d — c 


- 1 


dy 


d — c 


d + c 1 




d — c \ b — a 
' 2x —2 a 


2 

d + c 


(^-i) 


Therefore, the conditional variance of Y given the event X = a: is 

Var(Y/x) = £ (Y 2 /x) - E(Y/x ) 2 

= — (f- —[a 2 (a + 6)(4x — a — 6 ) + 3(6 — a ) 2 — 4a 2 a: 2 l 
36 \b — a J L J 


The conditional expectation E(X/y) and the conditional variance Var{X/y) 
can be found in a similar manner. This completes the proof of the theorem. 

The following figure illustrate the regression and scedastic curves of the 
Morgenstern uniform distribution function on unit square with a = 0.5. 
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Next, we give a definition of another generalized bivariate uniform dis¬ 
tribution. 

Definition 12.2. Let S C 1R 2 be a region in the Euclidean plane 1R 2 with 
area A. The random variables X and Y is said to be bivariate uniform over 
S if the joint density of X and Y is of the form 


f(x,y) = 



for (x,y) € S 
otherwise . 


In 1965, Plackett constructed a class of bivariate distribution F(x, y) for 
given marginals F\ (x) and F 2 (y) as the square root of the equation 

(a — 1) F(x, y) 2 — {1 + (a — 1) [^i (or) + F 2 (y)\ } F(x,y) + a F x (x) F 2 (y) =0 

(where 0 < a < oo) which satisfies the Frechet inequalities 

max {Fi(x) + F 2 (y) - 1, 0} < F(x,y) < min {F^x), F 2 (y)} . 


The class of bivariate joint density function constructed by Plackett is the 
following 


f{x,y) = af 1 (x)f 2 (y) 


[{a ~ 1) {Fi{x) + F 2 (y) - 2F 1 (x)F 2 (y)} + 1] 
[S(x,y) 2 - Aa(a - 1) Fx(x) F 2 (y)]% 


where 

S(x,y) = 1 + (a - 1) (Ti(a:) + F 2 (y)). 

If one takes Fi(x) = x and fi(x) = 1 (for i = 1,2), then the joint density 
function constructed by Plackett reduces to 


f(x,y) 


[(a - 1) {x + y- 2 xy} + 1] 

[{1 + (a — l)(x + y)} 2 — 4 a (a — 1) xy] 2 
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where 0 < x,y < 1, and a > 0. But unfortunately this is not a bivariate 
density function since this bivariate density does not integrate to one. This 
fact was missed by both Plackett (1965) and Mardia (1967). 


12.2. Bivariate Cauchy Distribution 


Recall that univariate Cauchy probability distribution was defined in 
Chapter 3 as 


f(x) 


0 + (x 



— 00 < X < 00, 


where a > 0 and 9 are real parameters. The parameter a is called the 
location parameter. In Chapter 4, we have pointed out that any random 
variable whose probability density function is Cauchy has no moments. This 
random variable is further, has no moment generating function. The Cauchy 
distribution is widely used for instructional purposes besides its statistical 
use. The main purpose of this section is to generalize univariate Cauchy 
distribution to bivariate case and study its various intrinsic properties. We 
define the bivariate Cauchy random variables by using the form of their joint 
probability density function. 


Definition 12.3. A continuous bivariate random variable (X,Y) is said to 
have the bivariate Cauchy distribution if its joint probability density function 
is of the form 


f{x,y) 


9 

2i t [ 9 2 + (x — a) 2 + (y — /3) 2 ] 5 


—oo < x, y < oo, 


where 9 is a positive parameter and a and /3 are location parameters. We de¬ 
note a bivariate Cauchy random variable by writing (A, Y) ~ CAU(9,a,/3). 


The following figures show the graph and the equi-density curves of the 
Cauchy density function /( x, y) with parameters a = 0 = (3 and 9 = 0.5. 
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The bivariate Cauchy distribution can be derived by considering the 
distribution of radio active particles emanating from a source that hit a 
two-dimensional screen. This distribution is a special case of the bivariate 
t-distribution which was first constructed by Karl Pearson in 1923. 

The following theorem shows that if a bivariate random variable (X, Y) is 
Cauchy, then it has no moments like the univariate Cauchy random variable. 
Further, for a bivariate Cauchy random variable (X, Y), the covariance (and 
hence the correlation) between X and Y does not exist. 

Theorem 12.3. Let (X, Y) ~ CAU(9,a, 0), where 6 > 0, a and (3 are pa¬ 
rameters. Then the moments E(X), E(Y), Var(X), Var(Y ), and Cov(X , Y) 
do not exist. 

Proof: In order to find the moments of X and Y, we need their marginal 
distributions. First, we find the marginal of X which is given by 


/ OO 

f{x,y) 

-OO 


dy 


e 


l-oo 27r [0 2 + (x — a) 2 + (y — (3) 2 ] l 


dy. 


To evaluate the above integral, we make a trigonometric substitution 


y = /3 + \Z\9 2 + (x — a) 2 ] tan ip. 


Hence 


dy = a/[ 0 2 + {x — a) 2 ] sec 2 ip dip 


[9 2 + (x - a) 2 + (y - (3) 2 ] 2 

= \9 2 + (x — a) 2 ] 2 (l + tan 2 ip) 2 
= [9 2 + (x — a ) 2 ] 2 sec 3 ip. 
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Using these in the above integral, we get 


,°o Q 

•I -oo 27r [9 2 + (x — a) 2 + (y — (3) 2 ] 2 


dy 


9 f* ^{9 2 + (x — a) 2 ] sec 2 ip dip 
J- f [0 2 + (x — a) 2 ] 2 sec 3 ip 


9 

2tt [9 2 + (x — a) 2 ] 


cos ip dip 


_ 9 

7 T [9 2 + (x — a) 2 ] ’ 

Hence, the marginal of X is a Cauchy distribution with parameters 9 and a. 
Thus, for the random variable X, the expected value E(X) and the variance 
Var(X) do not exist (see Example 4.2). In a similar manner, it can be shown 
that the marginal distribution of Y is also Cauchy with parameters 9 and f3 
and hence E(Y) and Var(Y) do not exist. Since 

Cov(X, Y) = E(XY) - E(X) E(Y ), 


it easy to note that Cov(X , Y) also does not exist. This completes the proof 
of the theorem. 


The conditional distribution of Y given the event X = x is given by 

_ f( x ’V) _ 1 9 2 + (x — a) 2 

f {y -p ( \ o 3 • 

h ( x ) 2 [02 + ( a ,_ a ) 2 + ( y _ /3 )2 ]2 

Similarly, the conditional distribution of X given the event Y = y is 

i e 2 + { y -p ) 2 


f{y/x ) = 


^ [ 9 2 + (x — a) 2 + (y — (3) 2 ] 2 


Next theorem states some properties of the conditional densities f(y/x ) and 
f{x/y). 


Theorem 12.4. Let (X,Y) ~ CAU(9,a, /3), where 9 > 0, a and (3 are 
parameters. Then the conditional expectations 


E(Y/x) = (3 
E(X/y) = a, 



Probability and Mathematical Statistics 


327 


and the conditional variances Var(Y/x ) and Var(X/y ) do not exist. 

Proof: First, we show that E(Y/x) is /3. The conditional expectation of Y 
given the event X = x can be computed as 


/ OO 

y f(y/x) dy 

-oo 


9 2 + (x — a) 2 


-oo 2 [02 _|_ _ a )2 _|_ (y _ £})2 ]; 


dy 


= \ [« 2 + (* - «) 2 ] l 
+ ^ + U - a) 2 ] J 


d (9 2 + (x - a) 2 + (y - /3) 2 ) 


= S [0 2 + (^-a) 2 ] 


-oo [0 2 + (x — a) 2 + (y — f3) 2 ] 2 

_ dy _ 

-oo [9 2 + (x - a) 2 + (y - (3) 2 } • 
2 


+ 7 ) [9 2 + (x — a)“] 


yj9 2 + (x - a) 2 + (y - P) 2 

I" 2 cos V’ dxj) 

I _ 2 l [0 2 + (x — a) 2 ] 


— 0 + (3 
= 0 - 


Similarly, it can be shown that E(X/y) = a. Next, we show that the con¬ 
ditional variance of Y given X = x does not exist. To show this, we need 
E (Y 2 /x), which is given by 


E (Y 2 /x) 



9 2 + (x — a) 2 

[ 9 2 + {x — a) 2 + (y — /3) 2 ) 2 


dy. 


The above integral does not exist and hence the conditional second moment 
of Y given X = x does not exist. As a consequence, the Var(Y/x) also does 
not exist. Similarly, the variance of X given the event Y = y also does not 
exist. This completes the proof of the theorem. 


12.3. Bivariate Gamma Distributions 

In this section, we present three different bivariate gamma probability 
density functions and study some of their intrinsic properties. 

Definition 12.4. A continuous bivariate random variable (X, Y) is said to 
have the bivariate gamma distribution if its joint probability density function 
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is of the form 


f(x,y) 


2 \/Sxy\ 

^ J 

0 


{xy) 2 




rcoi 02 


±(c-l) 


x+y _ 

‘ l-e T i 
J a -1 


where 8 £ [0,1) and a > 0 are parameters, and 

oo /i \ fc+2r 

h{z) : =E r!r( H r+1) ' 

r =0 7 


if 0 < x, y < oo 
otherwise, 


As usual, we denote this bivariate gamma random variable by writing 
(A, Y) ~ GAMK(a,6). The function Ik(z) is called the modified Bessel 
function of the first kind of order k. In explicit form f(x,y) is given by 


f(x,y) 


0 “-! r(a) 

kO 


oo //) \ot-\-k —1 

x+y ^ (yxy) 

1 ~ e z-^kir(a + k)(i-e) a + 2k 


for 0 < x, y < oo 
otherwise. 


The following figures show the graph of the joint density function f(x, y) 
of a bivariate gamma random variable with parameters a = 1 and 9 = 0.5 
and the equi-density curves of f(x,y). 



The equi-density curves c = f(x,y) 



In 1941, Kibble found this bivariate gamma density function. However, 
Wicksell in 1933 had constructed the characteristic function of this bivariate 
gamma density function without knowing the explicit form of this density 
function. If { (A), Yj) | i = 1,2,..., n } is a random sample from a bivariate 
normal distribution with zero means, then the bivariate random variable 

n n 

(A, Y), where A = ^ V^A 2 and Y = ^ has bivariate gamma distri- 

i =1 2—1 

bution. This fact was established by Wicksell by finding the characteristic 
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function of (X, Y). This bivariate gamma distribution has found applications 
in noise theory (see Rice (1944, 1945)). 

The following theorem provides us some important characteristic of the 
bivariate gamma distribution of Kibble. 

Theorem 12.5. Let the random variable (X, Y) ~ GAMK(a,6 ), where 
0 < a < oo and 0 < 6 < 1 are parameters. Then the marginals of X and Y 
are univariate gamma and 


E{X) = a 
E(Y ) = a 
Var(X) = a 
Var(Y) = a 
Cov(X,Y ) =a6 


M(s,t ) 


1 

[(1 - s) (1 - tj - 6st] a 


Proof: First, we show that the marginal distribution of X is univariate 
gamma with parameter a (and 0 = 1). The marginal density of X is given 
by 


poo 

fi(x) = f(x,y)dy 
Jo 


a+fe—1 


Jo 0 Q_1 F(a) 


1 _£±y (Oxy) 

G 1-9 2 -, k\T(a + k) (l~0) a + 2k 


dy 


> fla-l 


1 


(6x) 


a+fe—1 


k—0 

oo 


0“-! r(a) k\ r(a + k) (1 - 6) a + 2k J 0 

a+fc—1 


y a+k ~! e -A dy 


^ > fla-1 


1 


(M c 


k—0 

oo 

= E 

fc=0 


0“- 1 r(a) fc! r(a + fc) (1 — 0)«+ 2fc 


(! - 0) a+fc T(a + fc) 


e 


1-9) k\ r(a) 


a+fc 1 i_0 


1 

rR 


„Q!— 1 


e 



1 

R) 

l 

rR 


e 


e 


—a? 


x6 

e i -s 
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Thus, the marginal distribution of X is gamma with parameters a and 9=1. 
Therefore, by Theorem 6.3, we obtain 

E(X) = a, Var(X) = a. 

Similarly, we can show that the marginal density of Y is gamma with param¬ 
eters a and 9=1. Hence, we have 

E(Y) = a, Var(Y) = a. 


The moment generating function can be computed in a similar manner and 
we leave it to the reader. This completes the proof of the theorem. 


The following results are needed for the next theorem. From calculus we 
know that 


e 




OO 


E 


k\ ’ 


( 12 . 1 ) 


and the infinite series on the right converges for all 2 G JL Differentiating 
both sides of (12.1) and then multiplying the resulting expression by z, one 
obtains 


ze 


Z 


OO 


E fc 


k\ ' 


( 12 . 2 ) 


If one differentiates (12.2) again with respect to z and multiply the resulting 
expression by z, then he/she will get 


OO 

ze z + z 2 e z = k 2 ^. (12.3) 

fc=o 


Theorem 12.6. Let the random variable (X, Y) ~ GAMK(a,6 ), where 
0 < a < oo and 0 < 9 < 1 are parameters. Then 

E(Y/x) = 9x + {l-9)a 
E(X/y) = 9y + (l-9)a 
Var(Y/x) = (1 - 9) [ 29 x + (1 - 9) a} 

Var{X/y) = (1 - 9) [ 29 y + (1 - 9) a). 
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Proof: First, we will find the conditional probability density function Y 
given X = x, which is given by 


f(y/x) 

= f(x,y) 

h{x) 


1 


= e “ E 

k—0 


oo 

_ x+y v—> 
e i-. £ 

k—0 

1 


\a+fc -1 


( Oxy) c 

k\ r(a + k) (1 — 0 ) a+2k 


( 9x) k 

r(a + fc) (1 — 9 ) a + 2k k\ 


y 


a+k-l 


Next, we compute the conditional expectation of Y given the event X = x. 
The conditional expectation E(Y/x) is given by 


E(Y/x) 


yf(y/x)dy 


Jo 


Jo 


oo 

H< j "' E 

k—0 
oo 


k—0 

oo 


1 ( 0x) h 

r(a + fc) (1 — 9 ) a + 2k k\ 

1 (9 x) k r °° 


,a+k—l — 


y 


y a+k e~Yre dy 

(1 _ 0)“+*+! r(a + k) 


e llfl dy 


E 

fc —0 


F(a + k) (1 - 6 ) a + 2k k\ J 0 

1 ( 0x) k 

r(a + k) (1 - 6 ) a + 2k ~kT 


=d-.)«-* 

k—0 v 7 


= (1 — 0) e x ~^ 
= (1 — 9) a + Ox. 


0 x 0 X 0 x 

ry f l-e _|- pl-d 

+ i-e 


(by (12.1) and (12.2)) 


In order to determine the conditional variance of Y given the event X = x, 
we need the conditional expectation of Y 2 given the event X = x. This 
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conditional expectation can be evaluated as follows: 
E(Y 2 /x) 

/ OO 

y 2 f{y/x) dy 

1 ( 9x) k 


Jo 


y 2 e x J2 
k =o 


r(a + k) (1 - 0 ) a+2k k\ 


yo+k-'e-^dy 


E 

k—0 


1 


= e -r& £ 

k—0 


_ (M 

r(a + k ) (1 - 0 ) a+2k k\ 

1 (Oxy 


k n oo 


r(a + k) (1 - 9) a + 2k k\ 


y a+k+l e ~T^o dy 


(l-9) a+k+2 T(a + k + 2) 


oo -i / q \k 

= (l_ 0 )2 e x- I ^ J2(a + k + l)(a + k)- 

/c —0 ' ' ' 

= (1 - 0) 2 ^ (a 2 + 2afc + k 2 + a + fc) -V f "j 

n V ' L v ' 


k—0 

, /o\ 9 I 9 / „ N Ox T x 

= (1 - 9) a + a + (2a + 1) --- + --- + e x 


i-e l-e 


y, e ( Ox 


= (i-ef 


o . 9 Xj 9x ( 9x \ 

a*+a + (2„ + l) —+ — +( — ] 


k—0 

2 


k\ VI -o 


= (a 2 + a) (1 - 0) 2 + 2 (a + 1 ) 0 (1 - 0 ) x + 6> 2 x 2 . 

The conditional variance of Y given X = x is 
Var(Y/x) = E(Y 2 /x) - E(Y/x) 2 

= ( a 2 + a) (1 — 9) 2 + 2(a + 1) 9 (1 — 9) x + 9 2 x 2 
- [(1 - 9) 2 a 2 + 9 2 x 2 + 2a9 (1 - 9) x] 

= (1-9) [a(l — 9) + 20x]. 

Since the density function f(x,y ) is symmetric, that is f(x,y ) = f(y, x), 
the conditional expectation E(X/y) and the conditional variance Var(X/y) 
can be obtained by interchanging x with y in the formulae of E(Y/x ) and 
Var(Y/x). This completes the proof of the theorem. 

In 1941, Cherian constructed a bivariate gamma distribution whose prob¬ 
ability density function is given by 

e; fl+y) r min lx.y} (x-z)" 1 (y-z)^ z , n <r r v <r oo 

f( x , y ) = J nL r <“=) Jo z {x ~ z) (v ~ z) 


0 


otherwise, 
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where a±, 012,0:3 £ ( 0 ,oo) are parameters. If a bivariate random vari¬ 
able (X, Y) has a Cherian bivariate gamma probability density function 
with parameters 07,02 and 03, then we denote this by writing (X, Y) ~ 
GAMC(o/.\ : 02 , 03 ). 

It can be shown that the marginals of f(x,y) are given by 


h(x) = 


1 ~ai+a 3 —1 „—x 

r(ai+a 3 ) 

0 


if 0 < x < 00 

otherwise 


and 

r r(a2+a 3 ) e-y if 0 < y < 00 

M*) = t 

{0 otherwise. 

Hence, we have the following theorem. 

Theorem 12.7. If (X,K) ~ GAMC(a , /?, 7), then 


E(X) = a + 7 
E(Y )=(3 + 7 
Var(X) = a + 7 
Var(Y) = (3 + 7 
£(iy)=7+(a + 7 )(/I + 7). 


The following theorem can be established by first computing the con¬ 
ditional probability density functions. We leave the proof of the following 
theorem to the reader. 

Theorem 12.8. If (X,Y) ~ GAMC(a, /?, 7), then 

E(Y/x) =P+ —- x and E(X/y) =a+ - 1 — y. 

a + 7 p + 7 

David and Fix ( 1961 ) have studied the rank correlation and regression for 
samples from this distribution. For an account of this bivariate gamma dis¬ 
tribution the interested reader should refer to Moran ( 1967 ). 

In 1934 , McKay gave another bivariate gamma distribution whose prob¬ 
ability density function is of the form 

r(Sr08) xa ~ 1 (y ~ a: ) /3 ~ 1 e ~ ev AO<x<y<oo 


f(x,y) = 


0 


otherwise, 
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where 0,a,[3 £ (0, oo) are parameters. If the form of the joint density of 
the random variable (V, Y) is similar to the density function of the bivariate 
gamma distribution of McKay, then we write (X, Y) ~ GAMM(0,a, /3). 
The graph of probability density function f(x, y) of the bivariate gamma 
distribution of McKay for 9 = a = j 3 = 2 is shown below. The other figure 
illustrates the equi-density curves of this joint density function f(x, y). 




It can shown that if (X, Y) ~ GAMM(9,a, 0), then the marginal f\{x) 
of X and the marginal f -2 ( y) of Y are given by 


h{x) = 


f? x a 1 g Sx if0<X<OO 
r (“) — 

0 otherwise 


and 


Mv) = 


e a+l3 x oc+f}-i e -9 x if o < a; < 


00 


r(a+/3) 

0 otherwise. 

Hence X ~ GAM ( a, |) and Y ~ GAM (a + 0, |). Therefore, we have the 
following theorem. 

Theorem 12.9. If (X, Y) ~ GAMM(0,a, /3), then 


E{X) = 


E(Y) = 


ot, -\- f3 


a 


Var(X) = - 2 


Var(Y) = 


Q. 0 

~0~ 


9 — s — t 


9 — t 


M(s,t ) = 
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We state the various properties of the conditional densities of f(x,y), 
without proof, in the following theorem. 


Theorem 12.10. If (X,Y) ~ GAMM(6,a,/3), then 


E(Y/x) = x +^ 


E{X/y) 

Var(Y/x) 


ay 
a + f3 

0 2 


V ar(X/y) 


a (3 2 

(a + /?) 2 (a + /3+1) y 


We know that the univariate exponential distribution is a special case 
of the univariate gamma distribution. Similarly, the bivariate exponential 
distribution is a special case of bivariate gamma distribution. On taking the 
index parameters to be unity in the Kibble and Cherian bivariate gamma 
distribution given above, we obtain the corresponding bivariate exponential 
distributions. The bivariate exponential probability density function corre¬ 
sponding to bivariate gamma distribution of Kibble is given by 


f(x,y) 


e-(f^) V_ ^ dxy)k _ 

k\ T(k + 1) (1 - 9) 2k+1 


if 0 < x, y < oo 


l 0 otherwise, 

where 9 (E (0,1) is a parameter. The bivariate exponential distribution cor¬ 
responding to the Cherian bivariate distribution is the following: 


r r e min { x ’^i - ii e -(*+v) if o < x, y < oo 
f(x, y) = < 

^ 0 otherwise. 

In 1960, Gumble has studied the following bivariate exponential distribution 
whose density function is given by 


f(x,y) 


[(1 + Ox) (1 + Oy) — 9] e i x +v+ ex v) if 0 < x, y < oo 
0 otherwise, 


where 9 > 0 is a parameter. 
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In 1967, Marshall and Olkin introduced the following bivariate exponen¬ 
tial distribution 

! l _ g-(a+7)z _ g-(/3+7 )y _|_ e -(ax+/3y+'z max{x,y}) jf x , y > 0 

0 otherwise, 

where a, (3 ,7 > 0 are parameters. The exponential distribution function of 
Marshall and Olkin satisfies the lack of memory property 

P(X > x + t, Y > y + t / X > t,Y > t) = P(X > x, Y > y). 


12.4. Bivariate Beta Distribution 


The bivariate beta distribution (also known as Dirichlet distribution ) is 
one of the basic distributions in statistics. The bivariate beta distribution 
is used in geology, biology, and chemistry for handling compositional data 
which are subject to nonnegativity and constant-sum constraints. It is also 
used nowadays with increasing frequency in statistical modeling, distribu¬ 
tion theory and Bayesian statistics. For example, it is used to model the 
distribution of brand shares of certain consumer products, and in describing 
the joint distribution of two soil strength parameters. Further, it is used in 
modeling the proportions of the electorates who vote for a candidates in a 
two-candidate election. In Bayesian statistics, the beta distribution is very 
popular as a prior since it yields a beta distribution as posterior. In this 
section, we give some basic facts about the bivariate beta distribution. 

Definition 12.5. A continuous bivariate random variable (A, Y) is said to 
have the bivariate beta distribution if its joint probability density function is 
of the form 


f(x,y) 


r(S 1 +9 2 +9 3 ) — 1 6)9 — 1 

r(e 1 )r(f? 2 )r(e 3 ) x y 


(' l-x-y) 03 


-1 


if 0 < x, y, x + y < 1 


0 


otherwise, 
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where 61,62,63 are positive parameters. We will denote a bivariate beta 
random variable (X, Y) with positive parameters 9 \ , (9 2 and 0 3 by writing 
(X,Y) ~ Beta( 0 i,62,63). 

The following figures show the graph and the equi-density curves of 
f(x, y) on the domain of its definition. 


Joint PDF of Random Variables X and Y 



The equi-density curves c = f(x,y) 



In the following theorem, we present the expected values, the variances 
of the random variables X and Y, and the correlation between X and Y. 

Theorem 12 . 11 . Let (X,Y) ~ Beta{ 9 \, 0 2 ,63), where 61,62 and d 3 are 
positive apriori chosen parameters. Then X ~ Beta(9i,92 + 63) and Y ~ 
Beta(92, 9 \ + 63) and 


E(X) 


6i 

9 ’ 


Var(X) = 


61 (6 - 6 1) 

6 2 (6 + 1) 


E(Y) = 


62 
9 ’ 


Var{Y ) = 


62 (9 - 62) 
9 2 (0+1) 


Cov(X, Y ) 


6 1 9 2 

9 2 (9 + 1 ) 


where 9 = 9 \ + 62 + 63. 

Proof: First, we show that X ~ 62 + 63) and Y ~ Beta( 9 2 ,61+63). 

Since (X, Y) ~ Beta(92,63,63), the joint density of (X, Y) is given by 


f(x, y) = 


m 

T(6 1 )T(e 2 )T(93) 


a .9i-lj / 0 2 -l( 1 _ a ._ J/ )0 3 -l ) 
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where 0 = 9\ + 0 2 + 0 3 . Thus the marginal density of X is given by 


fi(x)= [ f(x, y) dy 
Jo 


m 


fl — x 

r 01-l / n ,@2 — 1 


m)r(0 2 )r(0 3 ) J 0 

m 

r(0 1 )r(0 2 )r(0 3 ) 


V 2 (1 - X - y) 3 dy 

1 — X 




03 — 1 


dy 


Now we substitute u = 1 — in the above integral. Then we have 

r(0) rl 


fi(x) = 


r(0!)r(0 2 )r(0 3 ) 

r(0) ^-1 


[ u e2 ~ l {l -u) 93 ~ l du 
Jo 


r(0 1 )r(0 2 )r(0 3 ) 
r(0i)r(0 2 + 0 3 


®‘’ 1 “ 1 (l-aO° 2+ ® s “ 1 B(0 2 ,0 3 ) 
a; &1_1 (l-a;) e2+03_1 


since 


f 1 vf> 2 -\ 1 - u)* 3 - 1 du = b( 0 2 ,0 3 ) = r ( r g 2 + ( f 3 v 
Jo 1 1,02 + 03 j 

This proves that the random variable X ~ i?eta(0i,0 2 + 0 3 ). Similarly, 

one can shows that the random variable Y ~ l?efa(0 2 ,0i + 0 3 ). Now using 

Theorem 6.5, we see that 

0i (0 - 0i) 


E{X) = -T, Var(X) = 


0 2 (0 + 1 ) 


E{Y) = Var(X) = 


02 (0 - 02 ) 

0 2 (0 + 1 ) ’ 


where 0 = 6 \ + 62 + 0 3 . 

Next, we compute the product moment of X and Y. Consider 
E(XY) 


xyf{x,y) dy 


10 Uo 

r (0) 


dx 


r(0i)r(0 2 )r(0 3 ) j 0 [J 0 

/*1 r r 1—x 


r(0i)r(0 2 )r(0 3 ) j 0 L ./ 0 

r(0) f 1 

r(0i)r(0 2 )r(0 3 ) y 0 


/»! —£C 

/ / xya: 6ll ^ 1 j/ 6,2_1 (l — x — y) e3 ~ 1 dy 

Jo Uo 

f [ x 9 l y e 2 (l-x-y) e3 ~ 1 dy 

Jo LJo 


da; 


da; 


+ (1- " 


1 — x 


0 3 —1 


dy 


dx. 
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Now we substitute u = in the above integral to obtain 


E(XY) = 


T(e 1 )T(6 2 )T(e 3 ) 


[ x 9l (l — x) 02+93 f u 02 (l — u) 03 1 du dx 
Jo Uo 


f u 02 (l-u) 03 - 1 du = B(6 2 + l,6 3 ) 

Jo 

[ x ei (i - x) 92+03 dx = B{e 1 + \,e 2 + e 3 + \) 
Jo 


we have 


E(XY) = , N N B(0 2 + 1, 6 3 ) B(6 1 + l,9 2 + 9 3 + 1) 

v ' r( 6 »i)r( 6 » 2 )r( 6 » 3 ) v 2 ’ 3; v 1 ’ 2 3 ; 

r(0) 0ir(0i) (0 2 + e 3 )r(d 2 + e 3 ) 0 2 r(0 2 )r(0 3 ) 

r(0!)r(0 2 )r(0 3 ) (0)(0 + i)r(0) ( 0 2 + 0 3 )r(0 2 + 0 3 ) 

= where 6 = 9 X + 0 2 + 0 3 . 

Now it is easy to compute the covariance of X and Y since 


Cov(X, Y) = E(XY) - E(X)E(Y) 

9\ 9- 2 9\ 9 2 

= 9(9+ 1) “ ~9 ~9 
9i 9 2 

~ ~9 2 (0+1)' 

The proof of the theorem is now complete. 

The correlation coefficient of X and Y can be computed using the co- 
variance as 

CovjX, Y) _ _ I 0 3 0 2 
P ~ sjVar(X) Var(Y) ~ \j (0i + 9 3 )(9 2 + 0 3 ) ' 

Next theorem states some properties of the conditional density functions 
f(x/y) and f(y/x). 

Theorem 12.12. Let (X, Y) ~ Beta(9i,9 2 ,9 3 ) where 9±,9 2 and 9 3 are 
positive parameters. Then 

£(YW - w- = ( fe Am~ + t+i) 

E ™ + 


'3 
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Proof: We know that if (X,Y) ~ Beta(0i, 0 2 ,0 3 ), the random variable 
X ~ Beta(0\,0 2 + 0 3 ). Therefore 

f(x,y) 
fl(x) 

1 r (d 2 + 0 3 ) 


f{y/x) = 


1-X T(0 2 )T(9 3 ) \1-x 


02-1 


1 - 


y 


03 — 1 


1 -x, 

is a beta random 


X=x 


for all 0 < y < 1 — x. Thus the random variable 
variable with parameters 0 2 and 0 3 . 

Now we compute the conditional expectation of Y/x. Consider 

E ( Y /x)= [ yf(y/x)dy 
J 0 


1 T{0 2 + 0 3 ) 

1 - s r(0 2 )r(6» 3 ) Jo 


»l—x 


y 

1 — X 


@2 — 1 


1 - 


1 — X 


03 — 1 


dy. 


Now we substitute u = in the above integral to obtain 

E ( Y /x) = ( 1 ~ x ) / ^0 -uf^du 

Jo 


r(0 2 )r(0 3 ) 
r( 6> 2 + e 3 ) 
r(0 2 )r(0 3 ) 
r(0 2 + 0 3 ) 

T{ 02 )T{ 0 3 ) 

02 


(1 — x) B(0 2 + 1, 0 3 ) 

9 2 T(9 2 )T(0 3 ) 


0 -x) 


(■02 + 03) T(02 + 03) 


0 -x). 

v 2 T v 3 

Next, we compute E(Y 2 /x). Consider 


E(Y 2 /x) = f y 2 f(y/x) 
Jo 


dy 


1 T( 0 2 + 0 3 ) 


1 - a: r(0 2 )r(6» 3 ) Jo 

m+ 0 3 ) (1 _ x)2 r 1 

Jo 


1 — X 


02~1 


1 - 


1 — X 


03~1 


dy 


,02 + 1 


r(0 2 )r(0 3 ) 
r( 6> 2 + 0 3 ) 
r(0 2 )r(6» 3 ) 
r(0 2 + e 3 ) 
r(0 2 )r(0 3 ) 


(l-u) 


03~1 


du 


C l-x) 2 B(0 2 + 2,0 3 ) 


(l-x) 2 


(02 + i)02r(0 2 )T{0 3 ) 

(02 + 0 3 + 1 ) (02 + 0 3 ) r ($ 2 + 0 3 ) 


(02 + 1) 02 


(02 + ^3 + 1) (02 + 0 3 


(l-x) 2 . 
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Therefore 


Var(Y/x) = E(Y 2 /x) - E(Y/x ) 2 = 


e 2 d 3 (1 - x) 2 

(0 2 + 9 3 ) 2 (8 2 + 9 3 + l)' 


Similarly, one can compute E(X/y ) and Var(X/y). We leave this com¬ 
putation to the reader. Now the proof of the theorem is now complete. 


The Dirichlet distribution can be extended from the unit square (0, l) 2 
to an arbitrary rectangle (ai,&i) x ( a 2 ,b 2 ). 


Definition 12.6. A continuous bivariate random variable (X 3 , X 2 ) is said to 
have the generalized bivariate beta distribution if its joint probability density 
function is of the form 


T(0i + 9 2 + 9 3 ) -r-r / Xk — CLk \ k A Xk — ctfe \ 3 

r(0i)r(0 2 )r(0 3 ) ^ =1 v — a k) v ^ ~ a k) 


where 0 < x 3 , x 2 , X\ + x 2 < 1 and 9i,9 2 , 9 3 , flu, iq, a 2 , b 2 are parameters. We 
will denote a bivariate generalized beta random variable (A, Y) with positive 
parameters 9\,9 2 and 9 3 by writing (X, Y) ~ GBeta{0\,6 2 ,9 3 ,a\,b\,a 2 ,b 2 ). 

It can be shown that if Xk = (bk — cik)Yk + ak (for k = 1,2) and each 
(Yi,Y 2 ) ~ Beta(6 i,6 2 ,0 3 ), then {X 3 ,X 2 ) ~ GBeta{9 ll 9 2 ,8 3 ,a 1 ,b 1 ,a 2 ,b 2 ). 
Therefore, by Theorem 12.11 

Theorem 12.13. Let (X, Y) ~ GBeta(9\,9 2 ,9 3 ,ai,bi,a 2 ,b 2 ), where 9 3 ,9 2 
and 0 3 are positive apriori chosen parameters. Then X ~ Beta{9\ 1 9 2 + 9 3 ) 
and Y ~ Beta(9 2 ,8\ + 9 3 ) and 

E(X) = (h -a 3 ) e -j + or, Var(X) = (b, - a,) 2 

E(Y) = (b 2 -a 2 )°f + os, Var(Y) = (b 2 - a 2 ) 2 

Cov{X , Y) = -(h - 0l )(b 2 - a 2 ) 

where 9 = 9\ + 9 2 + 9 3 . 


Another generalization of the bivariate beta distribution is the following: 


Definition 12.7. A continuous bivariate random variable (X-[, X 2 ) is said to 
have the generalized bivariate beta distribution if its joint probability density 
function is of the form 


f(xi,x 2 ) 


1 

B{a 3 , f3i)B(a 2 , /? 2 ) 


L (l- xJP 1 - 012 - 02 x^ 2 - 1 


(1 - x 3 x 2 ) 02 1 
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where 0 < xi,x 2 ,x\ + X 2 < 1 and aq, a 2 , Pi, /3 2 are parameters. 

It is not difficult to see that X ~ Beta(a i,/3i) and Y ~ Beta(a 2 , 0 2 )- 


12.5. Bivariate Normal Distribution 


The bivariate normal distribution is a generalization of the univariate 
normal distribution. The first statistical treatment of the bivariate normal 
distribution was given by Galton and Dickson in 1886. Although there are 
several other bivariate distributions as discussed above, the bivariate normal 
distribution still plays a dominant role. The development of normal theory 
has been intensive and most thinking has centered upon bivariate normal 
distribution because of the relative simplicity of mathematical treatment of 
it. In this section, we give an in depth treatment of the bivariate normal 
distribution. 


Definition 12.8. A continuous bivariate random variable {X, Y) is said to 
have the bivariate normal distribution if its joint probability density function 
is of the form 

f(x,y) = - - / e -kQ(*<v), —00 < x,y < 00, 

2 7T CJi cr 2 V 1 - P z 

where pi,p 2 G®, cr\,a 2 G (0,oo) and p € (—1,1) are parameters, and 


Q{x,y) 



(x- pi \ 2 f x- pA f y- p 2 \ 




As usual, we denote this bivariate normal random variable by writing 
(X, Y) ~ 7 V(/Lt!, /Lt 2 5 cn, cr2, p)- The graph of f(x,y) has a shape of a “moun¬ 
tain”. The pair ( / u 1 , / u 2 ) tells us where the center of the mountain is located 
in the (x, y)-plane, while erf and <r| measure the spread of this mountain in 
the ^-direction and y-direction, respectively. The parameter p determines 
the shape and orientation on the (a;, y)-plane of the mountain. The following 
figures show the graphs of the bivariate normal distributions with different 
values of correlation coefficient p. The first two figures illustrate the graph of 
the bivariate normal distribution with p = 0, p,\ = p 2 = 0, and <j\ = a 2 = 1 
and the equi-density plots. The next two figures illustrate the graph of the 
bivariate normal distribution with p = 0.5, p\ = p 2 = 0, and rr-\ = a 2 = 0.5 
and the equi-density plots. The last two figures illustrate the graph of the 
bivariate normal distribution with p = —0.5, p,\ = p 2 = 0, and <7i = 02 = 0.5 
and the equi-density plots. 
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One of the remarkable features of the bivariate normal distribution is 
that if we vertically slice the graph of f(x,y) along any direction, we obtain 
a univariate normal distribution. In particular, if we vertically slice the graph 
of the f(x,y) along the rr-axis, we obtain a univariate normal distribution. 
That is the marginal of f(x,y) is again normal. One can show that the 
marginals of f(x, y) are given by 


AO) = 



( 


) 


2 


and 


f-2 (y) = 




In view of these, the following theorem is obvious. 
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Theorem 12.14. If (X,Y) ~ N(pi, P 2 , 01 , 02 , p), then 

E(X) = vi 
E(Y) = p 2 
Var(X) = of 
V ar(Y) = a 2 
Corr(X , Y) = p 

M(s t ) = gA tlS +^ 2 i+ ^(crls 2 +2pc7 1 a2St+alt 2 ) 


Proof: It is easy to establish the formulae for E(X), E(Y), Var(X) and 
Var(Y). Here we only establish the moment generating function. Since 
(X,Y) ~ N(pi,V 2 ,tri,cr 2 ,p), we have X ~ N (p i,of) and Y ~ N (p 2 , erf). 
Further, for any s and f, the random variable W = sX + tY is again normal 
with 


Pw = sp i + tp 2 and = s 2 af + 2 stpaia 2 + t 2 a 2 - 

Since IT is a normal random variable, its moment generating function is given 

by 

M (r) = e AlwT+ 5 T2cr w. 

The joint moment generating function of (A', Y) is 
M(s, t) = E (e sX+tY ) 

— gMw+j o’vr 

_ e Atls+M2i+j(o’iS 2 +2po-iO'2St+<T2 t2 ) 


This completes the proof of the theorem. 

It can be shown that the conditional density of Y given X = x is 


f(y/x) = 


1 


02 a/27t(1 - p 2 ) 


.1 fV 

2 V "2 \/l-p 2 / 


where 


b = P2 + P — {x - Pi). 

0i 


Similarly, the conditional density f(x/y) is 


/(x/y) = — h n n2) e 

G! V 27T (1 - p 2 ) 


1 ( x-c \ 

2 \ ff i V 1 ~p 2 ) 
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where 

c = n 1 +p -(y-n 2 ). 

02 

In view of the form of f(y/x ) and f(x/y), the following theorem is transpar¬ 
ent. 

Theorem 12.15. If (X,Y) ~ yu 2 , 01 , cr 2 , p), then 

E{Y/x) = /i 2 + p -(x-pLi) 

ci 

E{X/y) = m + p — {y - H 2 ) 

VariY/x) = a\(l - p 2 ) 

Var{X/y) = of (1 - p 2 ). 

We have seen that if (X, Y) has a bivariate normal distribution, then the 
distributions of X and Y are also normal. However, the converse of this is 
not true. That is if X and Y have normal distributions as their marginals, 
then their joint distribution is not necessarily bivariate normal. 

Now we present some characterization theorems concerning the bivariate 
normal distribution. The first theorem is due to Cramer (1941). 

Theorem 12.16. The random variables X and Y have a joint bivariate 
normal distribution if and only if every linear combination of X and Y has 
a univariate normal distribution. 

Theorem 12.17. The random variables X and Y with unit variances and 
correlation coefficient p have a joint bivariate normal distribution if and only 
if 

holds for an arbitrary function g(x,y) of two variable. 

Many interesting characterizations of bivariate normal distribution can 
be found in the survey paper of Hamedani (1992). 

12.6. Bivariate Logistic Distributions 

In this section, we study two bivariate logistic distributions. A univariate 
logistic distribution is often considered as an alternative to the univariate 
normal distribution. The univariate logistic distribution has a shape very 
close to that of a univariate normal distribution but has heavier tails than 
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the normal. This distribution is also used as an alternative to the univariate 
Weibull distribution in life-testing. The univariate logistic distribution has 
the following probability density function 

_ TZ_ ( X ~V 

. 7 r e ^ 3 v ' 


f(x) = 


r 1 + e -*(^)f 


— OO < X < 00, 


where — oo < fj, < oo and cr > 0 are parameters. The parameter /i. is the 
mean and the parameter a is the standard deviation of the distribution. A 
random variable X with the above logistic distribution will be denoted by 
X ~ LOG(fi,a). It is well known that the moment generating function of 
univariate logistic distribution is given by 

M(t) = e?* T f 1 + ^ a?j T ^1 - ^ <rtj 

for \t\ < We give brief proof of the above result for ^ = 0 and cr = 
Then with these assumptions, the logistic density function reduces to 

e~ x 

f(x) = -„. 

(1 + e~ x ) 2 

The moment generating function with respect to this density function is 


/ OO 

etx f( x ) i 

-oo 


(l + e- 1 ) 5 


r°° P -x 

= L (e_,) ox^f dz 

r 1 ^ 

= / ( z~ x — l) dz where z = 

Jo 

= [ z t (l-z)~ t dz 
Jo 


— B( 1 + t, 1 — t) 

_ r(i + t)r(i-t) 
r(i + t + i — t) 
_ r(i +1) r(i -1) 
r (2) 

= r(i +1) r(i -1) 

= tcosec(t). 
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Recall that the marginals and conditionals of the bivariate normal dis¬ 
tribution are univariate normal. This beautiful property enjoyed by the bi¬ 
variate normal distribution are apparently lacking from other bivariate dis¬ 
tributions we have discussed so far. If we can not define a bivariate logistic 
distribution so that the conditionals and marginals are univariate logistic, 
then we would like to have at least one of the marginal distributions logistic 
and the conditional distribution of the other variable logistic. The following 
bivariate logistic distribution is due to Gumble (1961). 

Definition 12.9. A continuous bivariate random variable (A, Y) is said to 
have the bivariate logistic distribution of first kind if its joint probability 
density function is of the form 


f(x,y) 


2 7r 2 e '/s 


/ x-fi 1 . y-yi 2 \ 

V a l ^ a 2 ) 


3 (J \ (T 2 


— / \ 7T f V-^2 

IT e V3 \ ”i ) ~\~ e vs \ ^2 


3 


—oo < x, y < oo, 


where — oo < /zi,yt 2 < oo, and 0 < 0-1,02 < 00 are parameters. If a random 
variable (A, Y) has a bivariate logistic distribution of first kind, then we 
express this by writing (A, Y) ~ LOGF{y\, /^ 2 , 0 - 1 , 02 ). The following figures 
show the graph of f(x , y ) with yi = 0 = and ay = 1 = 02 and the equi- 
density plots. 



It can be shown that marginally, A is a logistic random variable. That 
is, A ~ LOG (hi, oy). Similarly, Y ~ LOG (^ 2 ^ 2 )- These facts lead us to 
the following theorem. 


Theorem 12.18. If the random variable (A, Y) ~ LOGF(yi, yt 2 , cri, o- 2 ), 
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then 

E(X) = hi 
E(Y) = H2 
Var(X) = aj 
V ar[Y) = o-l 

E(XY) = — <j\ (j 2 Hi M 2 5 
and the moment generating function is given by 

M(«,t) = e«"+«T r b . r b _ 

for M < obTI ancl |t| < ^ 

It is an easy exercise to see that if the random variables X and Y have 
a joint bivariate logistic distribution, then the correlation between X and Y 
is ’ ■ This can be considered as one of the drawbacks of this distribution in 
the sense that it limits the dependence between the random variables X and 
Y. 


The conditional density of Y given X = x is 


2 7T _2L_ ( y-^2 N 

f(y/x) = - 7 = e ^ V "2 


_ TT_ I x-H-1 

1 + e vs I ”1 , 


o 2 


V3 


_ IT I g-Ml \ _ 7T / 2 

1 + g V3I °-i / -|-e v's V <^2 , 


Similarly the conditional density of X given Y = y is 


2 7f _ 7 t / a; — 

f{x/y) = - 7 = e~Fs 


_z t_ ( y-v 2 ' 

1 + e V ct 2 > 


"I 2 


Cl 


v/3 


7T ( \ 7T / \ 1 3 

1 + e vsl *2 i 


Using these densities, the next theorem offers various conditional properties 
of the bivariate logistic distribution. 

Theorem 12.19. If the random variable (X,Y) ~ LOGF(hi, H 2 ,01 , 02 ), 
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then 


E(Y/x) = 1 — In ^1 + e Vs ( a i ) 

( -7T ( y-^2 \ 

1 + e vf 

7T^ 

Var{Y/x) = —— 1 
o 

Var(A/y) = y - 1. 


It was pointed out earlier that one of the drawbacks of this bivariate 
logistic distribution of first kind is that it limits the dependence of the ran¬ 
dom variables. The following bivariate logistic distribution was suggested to 
rectify this drawback. 

Definition 12.10. A continuous bivariate random variable (A, Y) is said to 
have the bivariate logistic distribution of second kind if its joint probability 
density function is of the form 


f(x,y) 


[l + <j) a (x,i/)] 2 


/ <f> a (x,y)~ 1 | \ 

\(/) a {x,y) +1 ) 


e a ( x +y), —oo<x,y<oo, 


where a > 0 is a parameter, and (/) a (x, y) := ( e~ ax + e~ ay ) “. As before, we 
denote a bivariate logistic random variable of second kind (A, Y) by writing 
(A, Y) ~ LOGS(a). 

The marginal densities of A and Y are again logistic and they given by 


AO) = 


(1 + e~ x ) 


2 ’ 


— 00 < X < 00 


and 


h(y) = 


o-y 


—, —oo < y < oo. 


(1 + e~ y ) 

It was shown by Oliveira (1961) that if (A, Y) ~ LOGS(a), then the corre¬ 
lation between A and Y is 


p(A,y) = i--L 
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12.7. Review Exercises 

1 . If (X,Y) ~ N(p 1 , p 2 ,ai,a 2 , p) with Q(x,y ) = x 2 + 2y 2 -2xy + 2x — 2y + l, 
then what is the value of the conditional variance of Y given the event X = x? 

2 . If (X,Y) ~ iV(/xi, /x 2 , cri, <t 2 , p) with 

Q(x, y) = -jC [(x + 3) 2 - 16(x + 3)(y- 2) + 4(y- 2) 2 ] , 

then what is the value of the conditional expectation of Y given X = x? 

3. If (X,Y) ~ N(p\, p 2 , ay, a 2 , p), then what is the correlation coefficient of 
the random variables U and V, where U = 2X + 3 Y and V = 2X — 3Y? 

4. Let the random variables X and Y denote the height and weight of 
wild turkeys. If the random variables X and Y have a bivariate normal 
distribution with pi = 18 inches, p 2 = 15 pounds, oy = 3 inches, a 2 = 2 
pounds, and p = 0.75, then what is the expected weight of one of these wild 
turkeys that is 17 inches tall? 

5. If (X, Y) ~ N(pi, p 2 ,ai,a 2 , p), then what is the moment generating 
function of the random variables U and V, where U = IX + 3F and V = 
7X - 3Y? 

6 . Let (V, Y) have a bivariate normal distribution. The mean of X is 10 and 
the variance of X is 12. The mean of Y is —5 and the variance of Y is 5. If 
the covariance of X and Y is 4, then what is the probability that X + Y is 
greater than 10? 

7. Let X and Y have a bivariate normal distribution with means px = 5 
and py = 6, standard deviations ax = 3 and ay = 2, and covariance 
axY = 2. Let $ denote the cumulative distribution function of a normal 
random variable with mean 0 and variance 1. What is P(2 < X — Y < 5) in 
terms of $ ? 

8 . If (X, Y) ~ N(pi,p 2 , (t i, cr 2 , p) with Q(x, y) = —x 2 + xy — 2y 2 , then what 
is the conditional distributions of X given the event Y = y? 

9. If (X,Y) ~ GAMK(a,6), where 0 < a < oo and 0 < 9 < 1 are parame¬ 
ters, then show that the moment generating function is given by 

= ( (l-s)(l-t)-0st) ' 
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10. Let A and Y have a bivariate gamma distribution of Kibble with pa¬ 
rameters a = 1 and 0 < 9 < 0. What is the probability that the random 
variable 7 A' is less than 

11. If (A, Y) ~ GAMC(a,(3, 7 ), then what are the regression and scedestic 
curves of Y on XI 

12. The position of a random point (X, Y) is equally probable anywhere on 
a circle of radius R and whose center is at the origin. What is the probability 
density function of each of the random variables X and Y ? Are the random 
variables X and Y independent? 

13. If (A, Y) ~ GAMC(a, f3, 7 ), what is the correlation coefficient of the 
random variables A and Y? 

14. Let A and Y have a bivariate exponential distribution of Gumble with 
parameter 9 > 0. What is the regression curve of Y on A? 

15. A screen of a navigational radar station represents a circle of radius 12 
inches. As a result of noise, a spot may appear with its center at any point 
of the circle. Find the expected value and variance of the distance between 
the center of the spot and the center of the circle. 

16. Let A and Y have a bivariate normal distribution. Which of the following 
statements must be true? 

(I) Any nonzero linear combination of A and Y has a normal distribution. 

(II) E(Y/X = x) is a linear function of x. 

(III) Var(Y/X = x) < Var(Y). 

17. If (A, Y) ~ LOGS (a), then what is the correlation between A and Y1 

18. If (A, Y) ~ LOGF(ix 1 ,ix 2 ,a 1 ,a 2 ), then what is the correlation between 
the random variables A and Y? 

19. If (A, Y) ~ LOGF(fi\, fi 2 , &i, 02 ), then show that marginally A and Y 
are univariate logistic. 

20. If (A, Y) ~ LOGF(fi 1 ,/j, 2 ,ai,a 2 ), then what is the scedastic curve of 
the random variable Y and A? 
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Chapter 13 

SEQUENCES 

OF 

RANDOM VARIABLES 
AND 

ORDER STASTISTICS 


In this chapter, we generalize some of the results we have studied in the 
previous chapters. We do these generalizations because the generalizations 
are needed in the subsequent chapters relating to mathematical statistics. In 
this chapter, we also examine the weak law of large numbers, Bernoulli’s law 
of large numbers, the strong law of large numbers, and the central limit the¬ 
orem. Further, in this chapter, we treat the order statistics and percentiles. 


13.1. Distribution of sample mean and variance 


Consider a random experiment. Let X be the random variable associ¬ 
ated with this experiment. Let f(x) be the probability density function of X. 
Let us repeat this experiment n times. Let X be the random variable asso¬ 
ciated with the k th repetition. Then the collection of the random variables 
{ X-\. X 2 , ..., X n } is a random sample of size n. From here after, we simply 
denote X- t . X 2 ,..., X n as a random sample of size n. The random variables 
X - { , X- 2 ,..., X n are independent and identically distributed with the common 
probability density function f(x). 

For a random sample, functions such as the sample mean X, the sample 
variance S 2 are called statistics. In a particular sample, say Xi,X 2 , we 

observed x and s 2 . We may consider 


x= - 

n < 
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and 

i n 

i=l 

as random variables and x and s 2 are the realizations from a particular 
sample. 

In this section, we are mainly interested in finding the probability distri¬ 
butions of the sample mean A and sample variance S' 2 , that is the distribution 
of the statistics of samples. 

Example 13.1. Let X\ and A '2 be a random sample of size 2 from a distri¬ 
bution with probability density function 

f(x) = { ~ x ) if 0 < x < 1 

10 otherwise. 

What are the mean and variance of sample sum Y = X 1 + A' 2 ? 

Answer: The population mean 




E(X) 



613(3,2) 
c r ( 3 )r( 2 ) 



1 

2' 


x) dx 
x) dx 

(here B denotes the beta function) 


Since X\ and A 2 have the same distribution, we obtain /i x 1 = \ = Mx 2 - 
Hence the mean of Y is given by 


E{Y) = E(X 1 + X 2 ) 

= E(Xi) + E(X 2 ) 
1 1 
~ 2 + 2 
= 1 . 
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Next, we compute the variance of the population X. The variance of X is 
given by 


Var(X) = E (X 2 ) - E{X) 2 

= j 6x 3 (1 — x)dx—^ S j 


= 6B(4,2)-(i) 

= r(4)r(2) _ / 1 '\ 

r(6) W 



6 5 

20 20 
1 

20 


Since X\ and X 2 have the same distribution as the population X, we get 
Var ( x i) = ^ = Var{X 2 ). 

Hence, the variance of the sample sum Y is given by 


Var{Y) = Var (X 1 + X 2 ) 

= Var (Xi) + Var (X 2 ) + 2 Cov (X 1 ,X 2 ) 

= Var (Xi) + Var (X 2 ) 

1 1 

“ 20 + 20 
1 

~ 10 ' 


Example 13.2. Let X-| and X 2 be a random sample of size 2 from a distri¬ 
bution with density 


f(x) = 



for x = 1,2,3,4 
otherwise. 


What is the distribution of the sample sum Y = X i + X 2 ? 
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Answer: Since the range space of X\ as well as X 2 is {1, 2, 3, 4}, the range 
space of Y = X 1 + X 2 is 


R y = {2, 3, 4, 5, 6, 7, 8}. 

Let g(y) be the density function of Y. We want to find this density function. 
First, we find g( 2), g{ 3) and so on. 

5(2) =P(Y = 2) 


= P(X 1 + X 2 = 2) 

= P(X i = l and X 2 = 1) 

= P (Ai = 1) P (X 2 = 1) (by independence of Xi and X 2 ) 

= /(!)/(!) 



5(3) = P(Y = 3) 

= P(X 1 + X 2 = 3) 

= P (X-\ = 1 and X 2 = 2) + P (X- t = 2 and A 2 = 1) 
= P(X 1 = 1) P (X 2 = 2) 


+ P {X\ = 2) P(X 2 = 1) 
= /(l)/(2) + /(2)/(l) 



(by independence of Xi and X 2 ) 
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5(4) = P (Y = 4) 

= P (Xl + X 2 = 4) 

= P (Xi = l and X 2 = 3) + P (Xi = 3 and X 2 = 1) 

+ P (Xi = 2 and X 2 = 2) 

= P {Xi = 3) P(X 2 = 1) + P(A 1 = 1) P(X 2 = 3) 

+ P (Xi = 2) P (X 2 = 2) (by independence of Xi and X 2 ) 
= /(l)/(3)+/(3)/(l) + /(2)/(2) 



3 

16 ' 


Similarly, we get 

S (5) = 4 9(6) = |., 9(7) =4 9(8) = (j. 

Thus, putting these into one expression, we get 
g{y) = P(Y = y) 


v -1 

= f( y - 

fe =1 



y-i 

Remark 13.1. Note that g(y) = ^/(/c) f(y — k) is the discrete convolution 

fc=i 

of / with itself. The concept of convolution was introduced in chapter 10. 
The above example can also be done using the moment generating func- 
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tion method as follows: 

M Y {t) = M Xl +x 2 (t) 

= M Xl (t) M X2 (t) 


e t + e 2t + e 3i + e 4i X fe t + e 2 1 + e 3t + e 4t 


e * + e 2t + e 3i + e 4tx 2 


e 2 t + 2 e 3t + 3e 4 1 + 4e 5 1 + 3e 6 t + 2e 7 1 + e 8t 


16 


Hence, the density of Y is given by 


g(y) = - - ^ ^ , 2/ = 2,3,4,8. 


Theorem 13.1. If X\, X 2 , ..., X n are mutually independent random vari¬ 
ables with densities fi(xi), f 2 {x 2 ), ■■■, fn{x n ) and E[ui(Xi)], i = 1,2 
exist, then 


n 


n 


E 


n <Xi) 


= |I E[u t (X t )\, 


i= 1 


i=l 


where Ui (i = 1,2,..., n) are arbitrary functions. 


Proof: We prove the theorem assuming that the random variables 
Xi, X 2 ,X n are continuous. If the random variables are not continuous, 
then the proof follows exactly in the same manner if one replaces the integrals 
by summations. Since 


e (n^)j 

= E(u 1 (X 1 )---u n (X n )) 

/ oo poo 

I ^l(^l) (*£l? •••> %n)dx i • • * dx n 

-00 J — 00 

/ oo noo 

• • • / Wi(xi) • • • u n (x n )f i(xi) • • • f n (x n )dx 1 • • • dx n 

-00 J —00 


u 1 (x 1 )f 1 (x 1 )dx 1 


^n(^n) fn{p^n)dx n 


= E[u 1 {X 1 ))---E{u n {X n )) 

n 

= ]jE(u i (X i )), 

2—1 
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the proof of the theorem is now complete. 

Example 13.3. Let X and Y be two random variables with the joint density 

f e ~( x +y) for 0 < x, y < oo 

f{x,y) = < 

^ 0 otherwise. 

What is the expected value of the continuous random variable Z = X 2 Y 2 + 


XY 2 + X 2 + X? 


Answer: Since 

f(x,y) = e~^ 

= e~ x e~ v 

= fi(x)f 2 (y), 

the random variables X and Y are mutually independent. Hence, the ex¬ 
pected value of X is 

pOO 

E{X)= / x fi (x) dx 
Jo 

xe~ x dx 

= r (2) 

= l. 

Similarly, the expected value of X 2 is given by 



roo 

E(X 2 )= / x 2 fi (x) dx 
Jo 


x 2 e x dx 


Jo 

= r ( 3 ) 

= 2 . 


Since the marginals of X and Y are same, we also get E(Y) = 1 and E(Y 2 ) = 
2. Further, by Theorem 13.1, we get 

E [Z] = E [A 2 T 2 + XY 2 + X 2 + X] 

= e [( x 2 + x ) (f 2 + i)] 

= E\X 2 + X] E [V 2 + 1] (by Theorem 13.1) 

= (E [X 2 ] + E [A]) (E [y 2 ] + 1) 

= (2 + 1 ) ( 2 + 1 ) 

= 9. 
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Theorem 13.2. If Xi, X 2 , ■■■, X n are mutually independent random vari¬ 
ables with respective means Mi , m 2 , and variances of, erf, then 

the mean and variance of Y = Y^i=i a ‘ -X"»> where 01 , a 2 , a n are real con¬ 
stants, are given by 

n n 

My = V] Mi and cry = of. 

i=l (=1 


Proof: First we show that my = Y^i= 1 a * Mi- Since 

MY = S(y) 


v*=i 


= ^a i £(X i ) 


2=1 


= ^ diHi 
2=1 


we have asserted result. Next we show cry = Y^i=i a i a i- Since 


Cov(Xi , X,) = 0 for i 7 ^ j, we have 


(jy- = Far(F) 

= Far (a,!,) 

n 

= Y j a 2 l Var(X i ) 
2=1 
n 

= £«W- 

i=l 


This completes the proof of the theorem. 

Example 13.4. Let the independent random variables Xi and X 2 have 
means Mi = —4 and M 2 = 3, respectively and variances erf = 4 and erf = 9. 
What are the mean and variance of Y = 3Xi — 2 X 2 ? 

Answer: The mean of Y is 


My = 3mi - 2m2 
= 3(—4) - 2(3) 
= -18. 
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Similarly, the variance of Y is 

a Y = (3) 2 + (~2) 2 a\ 

= 9 a\ + 4 a\ 

= 9(4) + 4(9) 

= 72. 

Example 13.5. Let Xi, X 2 , ..., X 50 be a random sample of size 50 from a 
distribution with density 

e~ % for 0 < x < oo 

0 otherwise. 

What are the mean and variance of the sample mean XI 

Answer: Since the distribution of the population X is exponential, the mean 
and variance of X are given by 

Hx = 0 , and a\ = 9 2 . 

Thus, the mean of the sample mean is 

X\ + Xi + • • • + X 50 


E X) =E 


50 


50 


= 50l>™ 


2=1 

50 




2=1 


^-500 = ., 

The variance of the sample mean is given by 

_ / 50 1 
Var(X)=Var\Y j -X i 

-t(i i 2 + 


2=1 

50 




2=1 


= 5 ° ',50 1 ^ 

50' 
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Theorem 13.3. If X\, X 2 ,..., X n are independent random variables with 
respective moment generating functions M x .(t), i = 1,2, then the mo¬ 
ment generating function of Y = J27=i a i^i is given by 

n 

M Y (t) = ]jM Xi ( ai t). 

i=1 


Proof: Since 

My(t) = a iXi (t) 

n 

=n m ^xm 

i =1 
n 

= ]^[ M Xi (ait) 

i=1 

we have the asserted result and the proof of the theorem is now complete. 

Example 13.6. Let Xi, X 2 ,..., X\q be the observations from a random 
sample of size 10 from a distribution with density 


f{x) = 




_I Z 2 

e 2 , 


—00 < x < 00 . 


What is the moment generating function of the sample mean? 

Answer: The density of the population A is a standard normal. Hence, the 
moment generating function of each X, is 


M Xz (t) = e^ 2 , i= 1,2,..., 10. 


The moment generating function of the sample mean is 





i=l 


r 

e 200 




Hence X ~ N (0, ^). 
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The last example tells us that if we take a sample of any size from 
a standard normal population, then the sample mean also has a normal 
distribution. 

The following theorem says that a linear combination of random variables 
with normal distributions is again normal. 

Theorem 13.4. If Xi, X 2 ,X n are mutually independent random vari¬ 
ables such that 

X{ ~ N (/Hi, of) , i = 1,2,..., n. 

n 

Then the random variable Y = ^ ]djXj is a normal random variable with 

i= 1 

mean 

n n 

Hy = a,i Hi and <jy = Vf o? of, 

i= 1 i — 1 

that is Y ~ N (£" =1 a^i, E"=i a 2(j2 )- 

Proof: Since each X, ~ N (/x^, cr?), the moment generating function of each 
X t is given by 

M Xi (t) = el itt+ i a * t \ 

Hence using Theorem 13.3, we have 


My(t) = Y[M Xi (Oit) 

i=1 


=n 


. , 1 2 2-2 




CLiH 


t+ 5 Ei 


Thus the random variable Y 
theorem is now complete. 


N 


'y JliHi 1 'y , a i Oi 


The proof of the 


Example 13.7. Let X-\, X 2 ,..., X n be the observations from a random sam¬ 
ple of size n from a normal distribution with mean h and variance a 2 > 0. 
What are the mean and variance of the sample mean X ? 
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Answer: The expected value (or mean) of the sample mean is given by 


E(X) 


1 

n 


2—1 


1 

n 


Em 


ii. 


Similarly, the variance of the sample mean is 


Var (A) = E Var 

i=l 




This example along with the previous theorem says that if we take a random 

sample of size n from a normal population with mean /i and variance a 2 , 

2 

then the sample mean is also normal with mean /i and variance —, that is 



Example 13.8. Let Xi, X%, ...,Xq 4 be a random sample of size 64 from a 
normal distribution with p = 50 and a 2 = 16. What are P (49 < Xg < 51) 
and P (49 < X < 5l)? 


Answer: Since Xg ~ X(50,16), we get 


P (49 < X 8 <51 )=P (49 - 50 < X 8 - 50 < 51 - 50) 
49 - 50 A s - 50 51 - 50 


= P 


< 


< 




= P\-\<Z<\ 
4 4 

= 2P ( Z < E - 1 


= 0.1974 


(from normal table). 
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By the previous theorem, we see that X ~ N (50, ||). Hence 

P (49 < X < 51) = P (49 - 50 < X - 50 < 51 - 50) 

( 40 - 50 X — 50 51 — 5(1 \ 



= P(-2 < Z < 2) 

= 2P(Z < 2) - 1 

= 0.9544 (from normal table). 


This example tells us that X has a greater probability of falling in an interval 
containing /j, than a single observation, say X 8 (or in general any X,-,). 

Theorem 13.5. Let the distributions of the random variables X\, X 2 ,..., X n 
be X 2 (ri), x 2 (r 2 ), ...,x 2 (r„), respectively. If X 1 ,X 2 ,...,X n are mutually in¬ 
dependent, then Y = X\ + X 2 -\ -h ~ X 2 (XT=i r 0- 

Proof: Since each Xi ~ x 2 ( r i)> the moment generating function of each Xi 
is given by 

M Xi {t) = (1-2 1)~% 

By Theorem 13.3, we have 

n n 

M Y (t ) = J] M Xi (i) = [](1 - 2t)"2 = (1 - 21)-5 E: =1 

i=l i=l 

Hence F ~ X 2 (S"=i r i) an( l the proof of the theorem is now complete. 

The proof of the following theorem is an easy consequence of Theorem 
13.5 and we leave the proof to the reader. 

Theorem 13.6. If Z\, Z 2l ..., Z n are mutually independent and each one 
is standard normal, then F 2 + Z\ + ■ ■ ■ + Z 2 ~ X' 2 ( n )> that is the sum is 
chi-square with n degrees of freedom. 

For our next theorem, we write 

1 n 1 n 

In = - V Xi and S 2 n = -- V(X, - I n ) 2 . 

n n — 1 ' 
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Hence 

X 2 =^(X 1 +X 2 ) 

and 

Si = {X t - X 2 ) + (X 2 - X 2 ) 

= l(X 1 -X 2 f+±(X 2 -X 1 ) 2 

= ±(X 1 -X 2 f. 

Further, it can be shown that 


X n -\-i — 


nX n + X n+ i 
n + 1 


(13.1) 


and 

nS 2 n+1 = (n-l)S* + ~—(X n+1 - X n ) 2 . (13.2) 

n + 1 

The folllowing theorem is very useful in mathematical statistics. In or¬ 
der to prove this theorem we need the following result which can be estab¬ 
lished with some effort. Two linear commbinations of a pair of independent 
normally distributed random variables are themselves bivariate normal, and 
hence if they are uncorrelated, they are independent. The prooof of the 
following theorem is based on the inductive proof by Stigler (1984). 

Theorem 13.7. If Xi, X 2 ,..., X n is a random sample of size n from the 
normal distribution 7V(/i, <r 2 ), then the sample mean X n = l^-i, and 

the sample variance S l = n _ l YH=i{Xi — X ) 2 have the following properties: 

(a) (?l ~J 2 ) S " ~ x 2 (n - 1), and 

(b) X n and S 2 are independent. 


Proof: We prove this theorem by induction. First, consider the case n = 2. 
Since each JQ ~ N(fi, a 2 ) for i = 1,2,..., n, therefore X x + X 2 ~ iV(2/x, 2cr 2 ) 
and X ± - X 2 ~ N(0, 2a 2 ). Hence 


X!-X 2 


N( 0 , 1 ) 


and therefore 


1 (X! - X 2 ) 2 

2 a 2 


This proves (a), that is, Sf ~ X 2 (l)- 


x 2 (i). 
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Since X\ and X 2 are independent, 

Cov(X 1 +X 2 , X 1 -X 2 ) 

= Cov(Xi,Xi) + Cov(X 1 ,X 2 ) - Cov(X 2 , X{) - Cov(X 2 , X 2 ) 
= CT 2 + 0 - 0 - <T 2 
= 0 . 


Therefore X\ + X 2 and X\ — X 2 are uncorrelated bivariate normal random 
variables. Hencce they are independent. Thus \{X\ + X 2 ) and f (Xi — X 2 ) 2 
are independent random variables. This proves (b), that is X 2 and S 2 are 
independent. 

Now assume the conclusion (that is (a) and (b)) holds for the sample of 
size n. We prove it holds for a sample of size n + 1. 

Since X 1 ,X 2 ,...,X n+1 are independent and each X t ~ N(fx,a 2 ), there¬ 
fore X n ~ N ^-V Moreover X n and X n+ \ are independent. Hence by 
(13.1), X n+ i is a linear combination of independent random variables X n 
and X n+1 . 

The linear combination X n+ i — X n of the random variables X n+ i and 
X n is a normal random variable with mean 0 and variance CT 2 jj ence 


X n +i X n 


n±l a 2 


N( 0 , 1 ). 


Therefore 


n (X n+ \ - X n ) 2 2 


n + 1 


x 2 (i)- 


Since X n+ i and S 2 are independent random variables, and by induction 
hypothesis X n and S 2 are independent, therefore dividing (13.2) by a 2 we 
get 

n S 2 n+1 _ n - 1) S 2 n (X n+1 - X n ) 2 


a 2 a 2 n +1 a 2 

= X 2 (n- 1) + x 2 (l) 

= X 2 (n)- 

Hence (a) follows. 

Finally, the induction hypothesis and the fact that 

_ n X n X n +i 

-&n+l — - 


n + 1 
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show that X n+ i is independent of S 2 . Since 

Cov(nX n + X n _^_h \ n +i X n ) 

= nCov{X n ,X n+l ) + nCov{X n ,X n ) - Cov(X n+1 •> -^n+l) 
-Cov(X n ^X n ) 


= 0 — n — + cr 2 —0 = 0, 
n 


the random variables n X n + X n+ \ and X n+ i — X n are uncorrelated. Since 
these two random variables are normal, therefore they are independent. 
Hence (nX n + X n+ \)/ (n+1) and {X n+ \ — A„) 2 /(n+l) are also independent. 
Since X n+ i and S' 2 are independent, it follows that X n+ i and 


n — 1 
n 


Si 


1 

n+1 


(X n+1 


X n ) 2 


are independent and hence X n+ i and S'„ +1 are independent. This proves (b) 
and the proof of the theorem is now complete. 

Remark 13.2. At first sight the statement (b) might seem odd since the 
sample mean X n occurs explicitly in the definition of the sample variance 
S 2 . This remarkable independence of X n and S' 2 is a unique property that 
distinguishes normal distribution from all other probability distributions. 


Example 13.9. Let X\, X 2 , ■■■,X n denote a random sample from a normal 
distribution with variance er 2 > 0. If the first percentile of the statistics 
W = 1 (A '‘j 2 Y ' 1 is 1.24, where X denotes the sample mean, what is the 

sample size n? 


Answer: 


100 


= P(W < 1.24) 

= p(± {x ‘-* r <™ 

V^Tl ) 

= p( / (n-l)4 < 1-24 


= P(x 2 (n-l)<1.24). 


Thus from x 2 -table, we get 


n- 1 = 7 


and hence the sample size n is 8. 
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Example 13.10. Let X i. .Y 2 ..... X:\ be a random sample from a nor¬ 
mal distribution with unknown mean and variance equal to 9. Let S 2 = 
| Et= 1 ( x i -X)-1fP ( s 2 < k) = 0.05, then what is k? 


Answer: 


0.05 = P(S 2 < k ) 

= p(x r^9 k 


= P[x 2 (3)<-kj. 

From % 2 -table with 3 degrees of freedom, we get 

3 


9 


and thus the constant k is given by 


k = 0.35 


k = 3(0.35) = 1.05. 


13.2. Laws of Large Numbers 

In this section, we mainly examine the weak law of large numbers. The 
weak law of large numbers states that if Xi,X%, ..., X n is a random sample 
of size n from a population X with mean fi, then the sample mean X rarely 
deviates from the population mean fi when the sample size n is very large. In 
other words, the sample mean X converges in probability to the population 
mean We begin this section with a result known as Markov inequality 
which is needed to establish the weak law of large numbers. 

Theorem 13.8 (Markov Inequality). Suppose A is a nonnegative random 
variable with mean E(X). Then 


P(X >t)< 


E(X) 

t 


for all t > 0. 

Proof: We assume the random variable X is continuous. If X is not con¬ 
tinuous, then a proof can be obtained for this case by replacing the integrals 



Sequences of Random Variables and Order Statistics 


370 


with summations in the following proof. Since 


E(X) = / x f(x)dx 


-L 


— oo 
t 


> 


> 


= t 


xf(x)dx 

D 

xf(x)dx 

tf{x)dx 

50 

f(x)dx 


= tP(X>t), 


xf(x)dx 


because x £ [t, oo) 


we see that 


P{X >t)< 


E{X) 

t 


This completes the proof of the theorem. 


In Theorem 4.4 of the chapter 4, Chebychev inequality was treated. Let 
X be a random variable with mean n and standard deviation a. Then Cheby¬ 
chev inequality says that 


P{\x-n\ <M>i-p 

for any nonzero positive constant k. This result can be obtained easily using 
Theorem 13.8 as follows. By Markov inequality, we have 


P((X- A i) 2 > t 2 ) < 


E{{X-tf) 

t 2 


for all t > 0. Since the events (X — /r) 2 > t 2 and \X — fj,\ > t are same, we 
get 

p{{x - /x) 2 > t 2 ) = p{\x -n\>t)< m)-) 

for all t > 0. Hence 

2 

P{\X-n\>t)<^. 

Letting t = ka in the above equality, we see that 


P(\X-v\>ka)<±. 
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Hence 

l-P(\X-^\<ka)<^. 

The last inequality yields the Chebychev inequality 

P(\X — fx\ < ka) > 1 — 

Now we are ready to treat the weak law of large numbers. 

Theorem 13.9. Let X\, X 2 , ... be a sequence of independent and identically 
distributed random variables with /i = E(Xi) and a 2 = Var(Xi) < 00 for 
i = 1,2,..., 00 . Then 

lim P(\S n — n\ > e) = 0 

n —>-oo 

for every e. Here S n denotes Xl+A ' 2 ^'" +A " . 

Proof: By Theorem 13.2 (or Example 13.7) we have 

2 

E(S n ) = /i and Var(S n ) = —. 

n 

By Chebychev’s inequality 


P{\S n -E{S n )\>e) < Var [ Sn) 

£ z 

for e > 0. Hence 

2 

P(\S n -^\ >e) < ^ 2 - 

n£ z 

Taking the limit as n tends to infinity, we get 

2 

lim P(\S n — n\ > e) < lim 

n —>00 n —>00 fl s z 

which yields 

lim PdS'n — > e) = 0 

n —>-oo 

and the proof of the theorem is now complete. 

It is possible to prove the weak law of large numbers assuming only E(X) 
to exist and finite but the proof is more involved. 

The weak law of large numbers says that the sequence of sample means 
irii f rom a population X stays close to the population mean E(X) most 
of the time. Let us consider an experiment that consists of tossing a coin 
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infinitely many times. Let X, be 1 if the i th toss results in a Head, and 0 
otherwise. The weak law of large numbers says that 

-q Xi + X 2 + ■ ■ ■ + X n 1 

b n = -> - as n —> oo (13.3) 

n 2 

but it is easy to come up with sequences of tosses for which (13.3) is false: 

HHHHHHHHHHHH . 

HHTHHTHHTHHT . 

The strong law of large numbers (Theorem 13.11) states that the set of “bad 
sequences” like the ones given above has probability zero. 

Note that the assertion of Theorem 13.9 for any e > 0 can also be written 
as 

lim P(\S n - n\ < e) = 1. 

n —>-oo 

The type of convergence we saw in the weak law of large numbers is not 
the type of convergence discussed in calculus. This type of convergence is 
called convergence in probability and defined as follows. 

Definition 13.1. Suppose X\,X 2 ,... is a sequence of random variables de¬ 
fined on a sample space S. The sequence converges in probability to the 
random variable X if, for any e > 0, 

lim P( \X n -X\<e) = l. 

n —>-oo 

In view of the above definition, the weak law of large numbers states that 
the sample mean X converges in probability to the population mean p. 

The following theorem is known as the Bernoulli law of large numbers 
and is a special case of the weak law of large numbers. 

Theorem 13.10. Let Xi,X 2 , ... be a sequence of independent and identically 
distributed Bernoulli random variables with probability of success p. Then, 
for any £ > 0, 

lim P(\S n -p\<e) = l 

n—>oo 

where S n denotes Xl + X2 +'"+ x ™ . 

n n 

The fact that the relative frequency of occurrence of an event E is very 
likely to be close to its probability P(E) for large n can be derived from 
the weak law of large numbers. Consider a repeatable random experiment 
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repeated large number of time independently. Let X * = 1 if E occurs on the 
i th repetition and X, = 0 if E does not occur on * th repetition. Then 

H = E{X t ) = 1 • P{E) + 0 • P(E) = P(E) for i = 1,2, 3,... 


and 

X 1 +X 2 + ---+X n = N(E) 

where N ( E ) denotes the number of times E occurs. Hence by the weak law 
of large numbers, we have 


lim P 


N(E) 


~ P(E) 


X 1 +X 2 + ---+X„ 


> € ) = lim P 

n—>oo 

= lim P (IS'n — fi\ > e) 

n —>-oo v 1 1 7 

= o. 


> £ 


Hence, for large n, the relative frequency of occurrence of the event E is very 
likely to be close to its probability P{E). 

Now we present the strong law of large numbers without a proof. 

Theorem 13.11. Let Xi,X 2 , ... be a sequence of independent and identically 
distributed random variables with fj, = E(Xi) and a 2 = Var(Xi) < oo for 
i = 1,2,..., oo. Then 

P ( lim S n = /x) = 1 

for every e > 0. Here S n denotes ~ Yl +~ Y2 +’"+~ Y " . 

The type convergence in Theorem 13.11 is called almost sure convergence. 
The notion of almost sure convergence is defined as follows. 

Definition 13.2 Suppose the random variable X and the sequence 
X|, X -2 ,..., of random variables are defined on a sample space S. The se¬ 
quence X n (w) converges almost surely to X(w) if 

P ({«> G S | Jkrn^ X n (w) = X(w)}) = 1. 

It can be shown that the convergence in probability implies the almost 
sure convergence but not the converse. 

13.3. The Central Limit Theorem 

Consider a random sample of measurement {X,}” =1 . The Xj’s are iden¬ 
tically distributed and their common distribution is the distribution of the 
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population. We have seen that if the population distribution is normal, then 
the sample mean X is also normal. More precisely, if Xi,X 2 , ...,X n is a 
random sample from a normal distribution with density 


f(x) = 



(^) 


2 


then 



The central limit theorem (also known as Lindeberg-Levy Theorem) states 
that even though the population distribution may be far from being normal, 
yet for large sample size n, the distribution of the standardized sample mean 
is approximately standard normal with better approximations obtained with 
the larger sample size. Mathematically this can be stated as follows. 


Theorem 13.12 (Central Limit Theorem). Let Xi,X 2 , ..., X n be a ran¬ 
dom sample of size n from a distribution with mean /r and variance a 2 < oo, 
then the limiting distribution of 



y/n 


is standard normal, that is Z n converges in distribution to Z where Z denotes 
a standard normal random variable. 

The type of convergence used in the central limit theorem is called the 
convergence in distribution and is defined as follows. 

Definition 13.3. Suppose X is a random variable with cumulative den¬ 
sity function F(x) and the sequence Xi,X 2 ,... of random variables with 
cumulative density functions Fi (x), F 2 (x),..., respectively. The sequence X n 
converges in distribution to X if 


lim F n (x) = F(x) 

n—>oo 

for all values x at which F(x) is continuous. The distribution of X is called 
the limiting distribution of X n . 

Whenever a sequence of random variables X\,X 2 ,... converges in distri¬ 
bution to the random variable X , it will be denoted by X n —> X. 
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Example 13.11. Let Y = X i + X -2 + • • • + Xi$ be the sum of a random 
sample of size 15 from the distribution whose density function is 

f § x 2 if — 1 < x < 1 

/(*) = 

0 otherwise. 

What is the approximate value of P(—0.3 < Y < 1.5) when one uses the 
central limit theorem? 

Answer: First, we find the mean p, and variance a 2 for the density function 
f(x). The mean for this distribution is given by 


M = 



3 

2 


x 3 dx 




i 


-l 


Hence the variance of this distribution is given by 

Var(X) = E(X 2 ) - [E(X)\ 2 



_ 3 
~ 5 
= 0 . 6 . 


P(—0.3 < Y < 1.5) = P(—0.3 - 0 < Y - 0 < 1.5 - 0) 

- p ( ^°' 3 < Y ~° < 1 - 5 ^ 

^V^5(06) " ^15(06) “7^6)/ 
= P(-0.10 < Z < 0.50) 

= P(Z < 0.50) + P{Z < 0.10) - 1 
= 0.6915 + 0.5398 — 1 
= 0.2313. 


Example 13.12. Let X\, X ^,..., X n be a random sample of size n = 25 from 
a population that has a mean /i = 71.43 and variance a 2 = 56.25. Let X be 
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the sample mean. What is the probability that the sample mean is between 
68.91 and 71.97? 

Answer: The mean of X is given by E (X) = 71.43. The variance of X is 
given by 

T y. cr 2 56.25 „„„ 

Var (X) = — = -= 2.25. 

v ' n 25 

In order to find the probability that the sample mean is between 68.91 and 
71.97, we need the distribution of the population. However, the population 
distribution is unknown. Therefore, we use the central limit theorem. The 
central limit theorem says that ~ N (0, 1) as n approaches infinity. 

y/n 

Therefore 

P (68.91 < X < 71.97) 

_ / 68.91- 71.43 < X- 71.43 < 71.97 - 71.43 \ 

~~ V 72.25 “ 7725 “ 7725 J 

= P(- 0.68 < W < 0.36) 

= P(W < 0.36) + P (W < 0.68) - 1 
= 0.5941. 

Example 13.13. Light bulbs are installed successively into a socket. If we 
assume that each light bulb has a mean life of 2 months with a standard 
deviation of 0.25 months, what is the probability that 40 bulbs last at least 
7 years? 

Answer: Let X, denote the life time of the i th bulb installed. The 40 light 
bulbs last a total time of 


S 40 = x 1 + x 2 + --- + x 40 . 


By the central limit theorem 


2 ^ 2=1 


Xj — n/z 


N( 0,1) 


as 


oo. 


Thus 


That is 


S 4 o - (40) (2) 
7(40) (0.25) 2 


N( 0,1). 


^40 - 80 


1.581 


N( 0,1). 
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Therefore 


P (<§40 > 7(12)) 


_ p ( *^40 — 80 84 — 80 \ 

V 1-581 “ 1.581 ) 

= P(Z> 2.530) 


= 0.0057. 


Example 13.14. Light bulbs are installed into a socket. Assume that each 
has a mean life of 2 months with standard deviation of 0.25 month. How 
many bulbs n should be bought so that one can be 95% sure that the supply 
of n bulbs will last 5 years? 

Answer: Let X, denote the life time of the 2 th bulb installed. The n light 
bulbs last a total time of 


S n =X 1 +X 2 + --- + X n . 
The total average life span S n has 

E{S n ) = 2n and Var(S n ) = 

By the central limit theorem, we get 

Sn-E(S n ) 


16' 


Thus, we seek n such that 

0.95 = P (S n > 60) 


p I S n — 2n > 60 — 2 n 


V fn 
4 


\/n 

4 


= P\ z> 


240 - 8 n 


= 1 -P\Z< 


Jn 
240 - 8 n 
■Jn 


From the standard normal table, we get 

240 - 8 n 


= -1.645 


which implies 


1.645^/u + 8n — 240 = 0. 
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Solving this quadratic equation for y/n, we get 

yfn = -5.375 or 5.581. 

Thus n = 31.15. So we should buy 32 bulbs. 

Example 13.15. American Airlines claims that the average number of peo¬ 
ple who pay for in-flight movies, when the plane is fully loaded, is 42 with a 
standard deviation of 8. A sample of 36 fully loaded planes is taken. What 
is the probability that fewer than 38 people paid for the in-flight movies? 

Answer: Here, we like to find P(X < 38). Since, we do not know the 
distribution of X, we will use the central limit theorem. We are given that 
the population mean is /r = 42 and population standard deviation is a = 8. 
Moreover, we are dealing with sample of size n = 36. Thus 

( Y-4? 

g— < 

6 

= P(Z < -3) 

= 1 -P(Z < 3) 

= 1 - 0.9987 
= 0.0013. 

Since we have not yet seen the proof of the central limit theorem, first 
let us go through some examples to see the main idea behind the proof of the 
central limit theorem. Later, at the end of this section a proof of the central 
limit theorem will be given. We know from the central limit theorem that if 
X \, X‘ 2 ,..., X n is a random sample of size n from a distribution with mean p, 
and variance er 2 , then 

^4Z~1V(0, 1) as n—>oo. 

y/n 

However, the above expression is not equivalent to 

N (fi, — ^ as n —> oo 

as the following example shows. 

Example 13.16. Let X\, X- 2 ,..., X n be a random sample of size n from 
a gamma distribution with parameters 0 = 1 and a = 1. What is the 


38-42 



Probability and Mathematical Statistics 


379 


distribution of the sample mean X ? Also, what is the limiting distribution 
of X as n —> oo? 

Answer: Since, each X-i ~ GAM( 1,1), the probability density function of 
each Xi is given by 

f e~ x if x > 0 

m = 

y 0 otherwise 

and hence the moment generating function of each X, t is 

Mx 


First we determine the moment generating function of the sample mean X, 
and then examine this moment generating function to find the probability 
distribution of X. Since 


Mx(t) 


Mi 


eiu* 


(*) 


- 


i= 1 


n 


4 (i- 1 ) 

i=l V n) 




therefore X ~ GAM (^, n). 

Next, we find the limiting distribution of X as n —* oo. This can be 
done again by finding the limiting moment generating function of X and 
identifying the distribution of X. Consider 


lim = lim 


1 


n —ioo ^5. — 

1 

linin—ioo (l y) 

1 

£> — t 


= e*. 


Thus, the sample mean X has a degenerate distribution, that is all the prob¬ 
ability mass is concentrated at one point of the space of X. 
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Example 13.17. Let X \, A 2 ,..., A„ be a random sample of size n from 
a gamma distribution with parameters 9=1 and a = 1. What is the 
distribution of 


X-n 


as 


00 


where /i and a are the population mean and variance, respectively? 


Answer: From Example 13.7, we know that 

^ ■ 

Since the population distribution is gamma with 9=1 and a = 1, the 
population mean /i is 1 and population variance cr 2 is also 1. Therefore 


M x-i (t) — M ^nX-y^n (0 

= (/»/) 
= e-^ 


The limiting moment generating function can be obtained by taking the limit 
of the above expression as n tends to infinity. That is, 



lim Mx_! ( t ) = 



lim 

71—KX) 


1 



n 


= e^* 2 (using MAPLE) 


A - p 


(7 



A(0,1). 


The following theorem is used to prove the central limit theorem. 

Theorem 13.13 (Levy Continuity Theorem). Let A'i,A 2 ,... be a se¬ 
quence of random variables with distribution functions Fi(x), F 2 (x ),... and 
moment generating functions M Xl (t), M X2 (t ),..., respectively. Let A be a 
random variable with distribution function F(x) and moment generating 
function M x (t). If for all t in the open interval (—h, h) for some h > 0 

lim M Xn (t) = M x (t), 
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then at the points of continuity of F(x) 

lim F n (x) = F(x). 


The proof of this theorem is beyond the scope of this book. 
The following limit 

1 + — + 1 = e 4 , if lim d(n) = 0, 

Tl Tl n—► oo 


lim 

n —>-oo 


(13.4) 


whose proof we leave it to the reader, can be established using advanced 
calculus. Here t is independent of n. 

Now we proceed to prove the central limit theorem assuming that the 
moment generating function of the population X exists. Let be 

the moment generating function of the random variable X — /i. We denote 
Mx~n(t) as M(i) when there is no danger of confusion. Then 


M( 0) = 1, ^ 

Af'(O) = E(X - n) = E(X) -n = M~M = 0, > (13.5) 

M"(0) = E((X- M ) 2 ) = a 2 . ) 

By Taylor series expansion of M (t) about 0, we get 

M(t ) = M( 0) + M'(0) t + 1 M"{ri) t 2 
where rj € (0 ,t). Hence using (13.5), we have 

M(t) = 1+1 M"(rj) t 2 

= 1 +^a 2 t 2 +^M"(r ] )t 2 -^a 2 t 2 

= 1 +2 ^^+2 t2 ' 


Now using M (t) we compute the moment generating function of Z n . Note 
that 


Zn, — 




1 


ovn 
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Hence 


i= 1 
n 

=n m x ~» 


t 


u y/n 
t 

a \fn 


t 


i=l 

M 

t 2 (M"(r)) — a 2 ) f 2 1 ” 

2 n 2 7i a 2 

for 0 < |?y| < |i|. Note that since 0 < |ry| < |t|, we have 

lim —' = 0, lim rj = 0, and lim M" (rj) — a 2 = 0. 

n —>-oo (J y/Ti n —»-oo n —»-oo 


Letting 


d(n) = 


(M"(rj) — a 2 ) t 2 
2 a 2 


and using (13.6), we see that lim d(n) = 0, and 

n —> ~ - 

M Zn (t) = 


t 2 d(n) 
1 + 2 ^ + ~ 


Using (13.7) we have 

lim Mz n {t) = lim 


t 2 d(n) 


1 + — + 


2 n n 


= e 2 


(13.6) 


(13.7) 


Hence by the Levy continuity theorem, we obtain 

lim F n (x) = 4>(a;) 

n —>-oo 

where <h(:r) is the cumulative density function of the standard normal distri¬ 
bution. Thus Z n —> Z and the proof of the theorem is now complete. 

Now we give another proof of the central limit theorem using L’Hospital 
rule. This proof is essentially due to Tardiff (1981). 


As before, let Z n = Then Mz n (t) = M where M(t ) is 

the moment generating function of the random variable X — fj,. Hence from 
(13.5), we have M( 0) = 1, M'{ 0) = 0, and M"( 0) = ct 2 . Letting h = —K=, 
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we see that n = 


a 2 h 2 ' 


Hence if n —> oo, then h —> 0. Using these and 


applying the L’Hospital rule twice, we compute 


lim Mz n {t) = lim 


M 


t 

a Jn 


= lim exp In In M 


t 


yaffil') 


h —>• 0 


= lim exp 


= lim exp 
h—>0 


= lim exp 
h—>0 


= lim exp 
h—>0 


= lim exp 
h^O 


O yfn 

) 


h 2 

1 M'(h )' 


M(h) 


2 h 


t 2 M'(h ) \ 

ct 2 2hM{h) J 

t 2 M"(h) 


a 2 2M(h) +2hM'(h) 


t 2 M"( 0)\ 

^2 2 M (0) J 

a 2 2 


= exp ( - r 


— form 


1 


(L'Hospital rule) 

(s Iorm ) 

(L'Hospital rule) 


Hence by the Levy continuity theorem, we obtain 


lim F n (x) = <fr(x) 

n —>-oo 

where <h(x) is the cumulative density function of the standard normal distri¬ 
bution. Thus as n —> oo, the random variable Z n Z, where Z ~ iV(0,1). 

Remark 13.3. In contrast to the moment generating function, since the 
characteristic function of a random variable always exists, the original proof 
of the central limit theorem involved the characteristic function (see for ex¬ 
ample An Introduction to Probability Theory and Its Applications, Volume II 
by Feller). In 1988, Brown gave an elementary proof using very clever Tay¬ 
lor series expansions, where the use of the characteristic function has been 
avoided. 

13.4. Order Statistics 

Often, sample values such as the smallest, largest, or middle observation 
from a random sample provide important information. For example, the 
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highest flood water or lowest winter temperature recorded during the last 
50 years might be useful when planning for future emergencies. The median 
price of houses sold during the previous month might be useful for estimating 
the cost of living. The statistics highest, lowest or median are examples of 
order statistics. 

Definition 13.4. Let Xi, X 2 ,..., X n be observations from a random sam¬ 
ple of size n from a distribution f(x). Let X^ denote the smallest of 
{Xi, X 2 ,..., X n }, X( 2 ) denote the second smallest of {Xi, X 2 ,..., X n }, and 
similarly Xr r \ denote the r th smallest of {Xi, X 2 ,..., X n }. Then the ran¬ 
dom variables X^. X^ 2) - ...,X( n \ are called the order statistics of the sam¬ 
ple Xi,X 2 , ...,X n . In particular, X( r \ is called the r th -order statistic of 
X u X 2 ,.^X n . 

The sample range, R, is the distance between the smallest and the largest 
observation. That is, 

R = -X’(n) _ -^(1)- 

This is an important statistic which is defined using order statistics. 

The distribution of the order statistics are very important when one uses 
these in any statistical investigation. The next theorem gives the distribution 
of an order statistic. 

Theorem 13.14. Let Xi, X 2 ,..., X n be a random sample of size n from a dis¬ 
tribution with density function f(x). Then the probability density function 
of the r th order statistic, X( r p is 

n) 1 

9{x) = (r-mn-ry. [F{x)] /(x) [1 W” ’ 

where F(x) denotes the cdf of f(x). 

Proof: We prove the theorem assuming f(x) continuous. In the case f(x) is 
discrete the proof has to be modified appropriately. Let h be a positive real 
number and x be an arbitrary point in the domain of /. Let us divide the 
real line into three segments, namely 

1R. = (—c«, x) |^J[ac, x + h) M[x + h, 00 ). 

The probability, say p -\, of a sample value falls into the first interval (— 00 , x\ 
and is given by 

Pi= f f{t)dt = F{x). 

J — OO 
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Similarly, the probability P 2 of a sample value falls into the second interval 
[. x , x + h) is 


f>x-\-h 


P2 = 


/(f) dt = F(x + h) — F(x). 


In the same token, we can compute the probability p^ of a sample value which 
falls into the third interval 


pOO 

P 3 = f(t)dt=l-F(x + h). 
J x-\-h 


Then the probability, Ph{x ), that (r—1) sample values fall in the first interval, 
one falls in the second interval, and (n — r) fall in the third interval is 


Phil 0 = 


r — 1 1 jn—r 


n\ 


r—1 _ _n—r 


, , Pi P 2 P 3 =7-TuT- it Pi P2p 3 ■ 

K r — 1 , 1 , n — r) (r — l)!(n — r)\ 

Hence the probability density function g(x) of the r th statistics is given by 

9(x) 

= i, m 

h^O h 


= lim 

h —>-0 


_^_ n r ~ 1 P^L n n ~' 

(r — 1)! (n — r)i 1 h 

n\ 


(r - 1)! („ - rjj |J? (l)r ’ _1 So F<J + 'ft — So [1 “ F{X + h)] ' 


n\ 


(r — 1)! (n — r)! 
n! 

(r — 1)! (n — r)! 


[^)r [i-iW 

^( x )]'- 1 /(*) [1 — F { x )] n ~ 


Example 13.18. Let X \, X 2 be a random sample from a distribution with 
density function 

f e~ x for 0 < x < 00 

fix) = < 

0 otherwise. 

What is the density function of Y = miii{X-i, X 2 } where nonzero? 

Answer: The cumulative distribution function of f(x) is 

F{x) = f e -t dt 
Jo 


= 1 - e~ x 
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In this example, n = 2 and r = 1. Hence, the density of Y is 

9{y) = orn [F{y)] ° f{y) [1 ~ F{y)] 

= 2 f(y) [1 - F(y)] 

= 2e~ y (1-1 + e~ y ) 

= 2e~ 2y . 


Example 13.19. Let Yi < Y 2 <■■■< Y§ be the order statistics from a 
random sample of size 6 from a distribution with density function 

( 2x for 0 < x < 1 

fix) = 

10 otherwise. 

What is the expected value of Y e l 

Answer: 

fix ) = 2x 

F(x) = 2 tdt 

Jo 

= x 2 . 

The density function of Yq is given by 

9i y ) = [^( 2/)] 5 fiv) 

= 6 (y 2 ) 5 2 y 
= 12 y 11 . 

Hence, the expected value of Y e is 

E{Y 6 ) = 


ygiy) dy 


Jo 


/ V 12 y 11 dy 
Jo 

— V 3 ! 1 

13 [y Jo 
12 

13' 


Example 13.20. Let X, Y and Z be independent uniform random variables 
on the interval (0, a). Let W = min {A, Y, Z}. What is the expected value of 
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Answer: The probability distribution of X (or Y or Z) is 

f - if 0 < x < a 

m = a 

{0 otherwise. 

Thus the cumulative distribution of function of f{x) is given by 

( 0 if x < 0 


F(x) = if 0 < a: < a 

l 1 if x > a. 

Since W = min{X. Y, Z}, W is the first order statistic of the random sample 
A, Y, Z. Thus, the density function of W is given by 


9 O) = I 1 “ F ( u ’)] 2 


= 3 f(w) [1 - F(u>)] 2 



a V a ) 


Thus, the pdf of W is given by 



if 0 < w < a 
otherwise. 


The expected value of W is 
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Example 13.21. Let AJ, X 2 , ...j be a random sample from a population 
X with uniform distribution on the interval [0,1]. What is the probability 
distribution of the sample range W := A(„, — Am? 

Answer: To find the distribution of W, we need the joint distribution of the 
random variable (A(„), A(i)). The joint distribution of is given 

by 

h(xi, x n ) = n(n - l)f(xi)f(x n ) [F(x n ) - F(x i)]"“ 2 , 

where x n > x\ and /( x) is the probability density function of X. To de¬ 
termine the probability distribution of the sample range W, we consider the 
transformation 

U = X { i) 

w = x (n) - a (1) 

which has an inverse 

A(d = U 1 
X (n) = U + W. ( 

The Jacobian of this transformation is 

J = det(] 5^=1. 


Hence the joint density of (U,W) is given by 


g(u,w) = | J| h(x 1 ,x n ) 

= n(n — 1 )f(u)f(u + w)[F(u + w) — F(u)] n ~ 2 


where w > 0. Since f(u) and f(u+w) are simultaneously nonzero if 0 < u < 1 
and 0 < u + w < 1. Hence f(u) and f(u + w) are simultaneously nonzero if 
0 < u < 1 — w. Thus, the probability of W is given by 


/ OO 

g(u, w) du 

-OO 

/ OO 

n(n — 1 )f(u)f(u + w) [F(u + w) — F(u )] n ~ 2 du 

-OO 


nl — W 


= n(n — 1) w n 2 du 

Jo 

= n(n — 1) (1 — w) w n ~ 2 


where 0 < w < 1. 
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13.5. Sample Percentiles 


The sample median, M , is a number such that approximately one-half 
of the observations are less than M and one-half are greater than M. 

Definition 13.5. Let Xi, X 2 ,..., X n be a random sample. The sample 
median M is defined as 


M = 


X (H 1 ) 

3 F(5) +X (^) 


if n is odd 
if n is even. 


The median is a measure of location like sample mean. 


Recall that for continuous distribution, 100p Ul percentile, tt p , is a number 
such that 

r 7r P 

P = f( x ) dx. 


Definition 13.6. The 100p th sample percentile is defined as 

X ([np]) if P < 0.5 

M if p = 0.5 

X( n +l — [n(l—p)]) if P ^ 0.5. 

where [6] denote the number b rounded to the nearest integer. 

Example 13.22. Let Xx,X 2 , ...,X 12 be a random sample of size 12. What 
is the 65 th percentile of this sample? 

Answer: 

lOOp = 65 
p = 0.65 

n(l — p) = (12)(1 — 0.65) =4.2 
[n(l — p)] = [4.2] = 4 

Hence by definition of 65 th percentile is 

^0.65 -^f(ro+1 — [n(l—p)]) 

= -^( 13 - 4 ) 

= X( 9 ). 
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Thus, the 65 th percentile of the random sample X±, X 2 ,..., X 12 is the 9 th - 
order statistic. 


For any number p between 0 and 1, the 100p th sample percentile is an 
observation such that approximately np observations are less than this ob¬ 
servation and n(l — p) observations are greater than this. 

Definition 13.7. The 25 th percentile is called the lower quartile while the 
75 th percentile is called the upper quartile. The distance between these two 
quartiles is called the interquartile range. 

Example 13.23. If a sample of size 3 from a uniform distribution over [0,1] 

is observed, what is the probability that the sample median is between | and 
3? 

4 ' 

Answer: When a sample of (2n + 1) random variables are observed, the 
(n + l) th smallest random variable is called the sample median. For our 
problem, the sample median is given by 

X {2) = 2 nd smallest {X 1 ,X 2l X 3 }. 

Let Y = X( 2 ). The density function of each X t is given by 

(l if 0 < x < 1 

f(x) = < 

10 otherwise. 

Hence, the cumulative density function of f(x) is 


F(x) = x. 


Thus the density function of Y is given by 


Therefore 


= [F{y )} 2 ~ 1 f{y) [i ~mf ~ 2 

= 6 F(y)f(y) [1 - F(y)] 

= 62/(1 -y). 



< Y 




g{y ) dy 



6y(i-y)dy 




3 

4 


1 

4 


11 

16' 
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13.6. Review Exercises 

1. Suppose we roll a die 1000 times. What is the probability that the sum 
of the numbers obtained lies between 3000 and 4000? 

2 . Suppose Kathy flip a coin 1000 times. What is the probability she will 
get at least 600 heads? 

3. At a certain large university the weight of the male students and female 
students are approximately normally distributed with means and standard 
deviations of 180, and 20, and 130 and 15, respectively. If a male and female 
are selected at random, what is the probability that the sum of their weights 
is less than 280? 

4. Seven observations are drawn from a population with an unknown con¬ 
tinuous distribution. What is the probability that the least and the greatest 
observations bracket the median? 

5. If the random variable X has the density function 

( 2(1 — x) for 0 < x < 1 

/(*) = { 

I 0 otherwise, 


what is the probability that the larger of 2 independent observations of X 
will exceed \l 

6 . Let X \, Xi , A 3 be a random sample from the uniform distribution on the 
interval (0, 1). What is the probability that the sample median is less than 
0.4? 

7. Let Xi, X 2 , A 3 , X 4 , X 5 be a random sample from the uniform distribution 
on the interval (0, 9), where 9 is unknown, and let X max denote the largest 
observation. For what value of the constant k, the expected value of the 
random variable kX max is equal to 91 

8 . A random sample of size 16 is to be taken from a normal population having 
mean 100 and variance 4. What is the 90 th percentile of the distribution of 
the sample mean? 

9. If the density function of a random variable X is given by 


2x 


for - < x < e 

e 


f{x) = 


0 


otherwise, 
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what is the probability that one of the two independent observations of X is 
less than 2 and the other is greater than 1? 

10. Five observations have been drawn independently and at random from 
a continuous distribution. What is the probability that the next observation 
will be less than all of the first 5? 


11 . Let the random variable X denote the length of time it takes to complete 
a mathematics assignment. Suppose the density function of X is given by 


f(x) 


e for 6 < x < oo 

0 otherwise, 


where 9 is a positive constant that represents the minimum time to complete 
a mathematics assignment. If X-[. X ^,..., X Fj is a random sample from this 
distribution. What is the expected value of X^? 

12. Let X and Y be two independent random variables with identical prob¬ 
ability density function given by 


f(x) 


e x for x > 0 
0 elsewhere. 


What is the probability density function of W = max{X, Y] ? 


13. Let X and Y be two independent 
ability density function given by 



random variables with identical prob- 

for 0 < x < 9 
elsewhere, 


for some 9 > 0. What is the probability density function of W = min{X, F}? 


14. Let X\,X 2 , ■■■,X n be a random sample from a uniform distribution on 
the interval from 0 to 5. What is the limiting moment generating function 
of as n —► oo? 

15. Let Xi,X 2 , ..., X n be a random sample of size n from a normal distri¬ 
bution with mean /i and variance 1. If the 75 th percentile of the statistic 
W = £"=1 (Xi — X) 2 is 28.24, what is the sample size n ? 

16. Let Xi, X 2 , ..., X n be a random sample of size n from a Bernoulli distri¬ 
bution with probability of success p = What is the limiting distribution 
the sample mean X ? 
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17. Let Xi, X 2 ,..., ^1995 be a random sample of size 1995 from a distribution 
with probability density function 

e~ x X x 

«*>" -3- 

What is the distribution of 1995X ? 


x = 0,1,2,3,00. 


18. Suppose Xi, X 2 , ..., X n is a random sample from the uniform distribution 
on (0,1) and Z be the sample range. What is the probability that Z is less 
than or equal to 0.5? 

19. Let Xi, X 2 , ...,Xg be a random sample from a uniform distribution on 
the interval [1,12]. Find the probability that the next to smallest is greater 
than or equal to 4? 

20. A machine needs 4 out of its 6 independent components to operate. Let 
X-|, X 2 ,.... X 6 be the lifetime of the respective components. Suppose each is 
exponentially distributed with parameter 6 . What is the probability density 
function of the machine lifetime? 

21. Suppose X-|. X 2 ,.... X 2n+1 is a random sample from the uniform dis¬ 
tribution on (0,1)- What is the probability density function of the sample 
median X/„ +1 )? 

22. Let X and Y be two random variables with joint density 


/(*> y) = {l 2x 


if 0 < y < 2x < 1 
otherwise. 


What is the expected value of the random variable Z = X 2 Y 3 + X 2 — X — F 3 ? 

23. Let X-|, X 2 ,..., X50 be a random sample of size 50 from a distribution 
with density 


/ O) = 


■x^eS 


for 0 < x < 00 


r(«) 0 “ 

10 otherwise. 

What are the mean and variance of the sample mean X? 


24. Let Xi,X 2 , ...,Xioo be a random sample of size 100 from a distribution 
with density 


f(x) = 


0 


for x = 0,1, 2,..., 00 
otherwise. 


What is the probability that X greater than or equal to 1? 
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Chapter 14 

SAMPLING 
DISTRIBUTIONS 
ASSOCIATED WITH THE 
NORMAL 
POPULATIONS 


Given a random sample X \ 1 X 2 ,..., X n from a population X with proba¬ 
bility distribution f(x: 9), where 9 is a parameter, a statistic is a function T 
of Xi,X 2 ,..., X n , that is 


T = T(X 1 ,X 2 ,...,X n ) 

which is free of the parameter 9. If the distribution of the population is 
known, then sometimes it is possible to find the probability distribution of 
the statistic T. The probability distribution of the statistic T is called the 
sampling distribution of T. The joint distribution of the random variables 
Xi, X 2 ,X n is called the distribution of the sample. The distribution of 
the sample is the joint density 

n 

f(x 1,X 2 ,..., x n ; 9) = f(x 1; 9)f(x 2 ; 9) • • • f(x n ; 9) = f(xi\ 9) 

*;=1 

since the random variables X\,X 2 , ...,X n are independent and identically 
distributed. 

Since the normal population is very important in statistics, the sampling 
distributions associated with the normal population are very important. The 
most important sampling distributions which are associated with the normal 
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population are the followings: the chi-square distribution, the student’s t- 
distribution, the F-distribution, and the beta distribution. In this chapter, 
we only consider the first three distributions, since the last distribution was 
considered earlier. 


14.1. Chi-square distribution 


In this section, we treat the Chi-square distribution, which is one of the 
very useful sampling distributions. 


Definition 14.1. A continuous random variable A is said to have a chi- 
square distribution with r degrees of freedom if its probability density func¬ 
tion is of the form 


/0; r) = 


r (i)25 


0 


if 0 < x < oo 
otherwise, 


where r > 0. If A' has chi-square distribution, then we denote it by writing 
X ~ X 2 (r). Recall that a gamma distribution reduces to chi-square distri¬ 
bution if a = | and 6 = 2. The mean and variance of X are r and 2r, 
respectively. 



Thus, chi-square distribution is also a special case of gamma distribution. 
Further, if r —> oo, then chi-square distribution tends to normal distribution. 

Example 14.1. If X ~ GAM(1,1), then what is the probability density 
function of the random variable 2A? 

Answer: We will use the moment generating method to find the distribution 
of 2A'. The moment generating function of a gamma random variable is given 

by 

M(t) = (i -~dty a , if t<\. 

V 
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Since X ~ GAM( 1,1), the moment generating function of X is given by 

Mx(t) = ^~ t , t< 1. 

Hence, the moment generating function of 2X is 

M'2x (t) = M x (2 1) 

1 

“ 1 - 2 t 
1 

_ (1 - 2t)l 
= MGF of x 2 (2). 

Hence, if X is GAM( 1,1) or is an exponential with parameter 1, then 2X is 
chi-square with 2 degrees of freedom. 

Example 14.2. If X ~ X 2 (5), then what is the probability that X is between 
1.145 and 12.83? 

Answer: The probability of X between 1.145 and 12.83 can be calculated 
from the following: 


P(1.145 < X < 12.83) 

= P{X < 12.83) - P{X < 1.145) 


Jo 


f*12.83 

f(x) dx - 

12.83 i 


r 1.145 


f(x) dx 


r-1.145 


■ x 2 1 e 2 dx — 


Jo r (|) 2i 
= 0.975 — 0.050 (from y 2 table) 
= 0.925. 


r (f) 


■ x 2 1 e 2 dx 


The above integrals are hard to evaluate and thus their values are taken from 
the chi-square table. 

Example 14.3. If X ~ X 2 (7), then what are values of the constants a and 
b such that P{a < X <b) = 0.95? 

Answer: Since 


0.95 = P(a < X < b) = P{X < b) - P(X < a), 


we get 


P(X <b) = 0.95 + P{X < a). 
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We choose a = 1.690, so that 

P{X < 1.690) = 0.025. 


From this, we get 


P{X <b)= 0.95 + 0.025 = 0.975 


Thus, from chi-square table, we get b = 16.01. 


The following theorems were studied earlier in Chapters 6 and 13 and 
they are very useful in finding the sampling distributions of many statistics. 
We state these theorems here for the convenience of the reader. 

Theorem 14.1. If X ~ N(^i,a 2 ), then ( ~ c ^ l - \ ~ X 2 (l). 

Theorem 14.2. If X ~ 1V(/x,ct 2 ) and X\, X2 ,..., X n is a random sample 
from the population X, then 



~ X 2 (n)- 


Theorem 14.3. If X ~ N(n,a 2 ) and Xi, X 2 ,X n is a random sample 
from the population X, then 


(n — 1) S 2 

ST 2 


X 2 (n~l)- 


Theorem 14.4. If X ~ GAM(9,a), then 

^~ X 2 (2a). 


Example 14.4. A new component is placed in service and n spares are 
available. If the times to failure in days are independent exponential vari¬ 
ables, that is Xi ~ EXP(100), how many spares would be needed to be 95% 
sure of successful operation for at least two years ? 

Answer: Since X z ~ EXP{ 100), 


^Xi - GAM(100, n). 

i— 1 
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Hence, by Theorem 14.4, the random variable 

Y =io£ x ‘~x‘ i2n) - 

i=l 

We have to find the number of spares n such that 
0.95 = P — 2 years^j 

= P Xi > 730 clays^J 

= p (imE^>t ^ 730 da ^ s 


i =1 


100 


=P (±± XI >™) 

\ i nn ^ r,n 


i=1 50 J 


100 

\ 

= P (x 2 (2 n) > 14.6) . 

(from x 2 table) 

Hence n = 13 (after rounding up to the next integer). Thus, 13 spares are 
needed to be 95% sure of successful operation for at least two years. 


2 n = 25 


Example 14.5. If X ~ 1V(10,25) and X -\, X 2 , ...,X 501 is a random sample 
of size 501 from the population X, then what is the expected value of the 
sample variance S 2 ? 


Answer: We will use the Theorem 14.3, to do this problem. By Theorem 
14.3, we see that 


Hence, the expected value of S 2 is given by 


(501 — 1) S 2 

a 2 
2 


X 2 (500). 


E [S' 2 ] 



S 2 

s 2 
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14.2. Student’s i-distribution 

Here we treat the Student’s i-distribution, which is also one of the very 
useful sampling distributions. 

Definition 14.2. A continuous random variable X is said to have a t- 
distribution with v degrees of freedom if its probability density function is of 
the form 

T ( —) 

/0; v ) = ----oo < x < oo 

V^r(|) (1 + #^ 

where v > 0. If A has a i-distribution with v degrees of freedom, then we 
denote it by writing X ~ t{y). 

The i-distribution was discovered by W.S. Cosset (1876-1936) of Eng¬ 
land who published his work under the pseudonym of student. Therefore, 
this distribution is known as Student’s i-distribution. This distribution is a 
generalization of the Cauchy distribution and the normal distribution. That 
is, if v = 1, then the probability density function of X becomes 

f(x; 1) = —r —- oo < x < oo, 

which is the Cauchy distribution. Further, if v —>■ oo, then 

lim f(x; v) = —j= e~^ x — oo < x < oo, 

i^oo V27r 

which is the probability density function of the standard normal distribution. 
The following figure shows the graph of t-distributions with various degrees 
of freedom. 


Student's t Distributions 



Example 14.6. If T ~ f(10), then what is the probability that T is at least 
2.228 ? 
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Answer: The probability that T is at least 2.228 is given by 

P(T > 2.228) = 1 - P(T < 2.228) 

= 1 — 0.975 (from t — table) 

= 0.025. 

Example 14.7. If T ~ t(19), then what is the value of the constant c such 
that P (|T| < c) = 0.95 ? 

Answer: 

0.95 = P{\T\ < c) 

= P (-c < T < c) 

= P{T <c)-1 +P{T <c) 

= 2P(T< c) - 1 . 

Hence 

P (T < c) = 0.975. 

Thus, using the t-table, we get for 19 degrees of freedom 

c= 2.093. 


Theorem 14.5. If the random variable X has a i-distribution with v degrees 
of freedom, then 


E[X] 


0 if v > 2 

ONE if v = 1 


and 


Var[X} 


V 

v-2 


if v > 3 


DNE if v = l,2 


where DNE means does not exist. 


Theorem 14.6. If Z ~ iV(0,l) and U ~ X 2 ( u ) anc l i n addition, Z and U 
are independent, then the random variable W defined by 


W = 



has a t-distribution with v degrees of freedom. 
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Theorem 14.7. If X ~ N(/j,,<j 2 ) and X \, X 2 ,..., X n be a random sample 
from the population X, then 

X — a , . 

_S_ ~t(n- 1 ). 

y/n 


Proof: Since each A, ~ N(fj,,a 2 ), 


Thus, 

^~7V(0,1). 

y/n 

Further, from Theorem 14.3 we know that 

( n ~ !) ~ xV“ !)• 

o- 

Hence 


X-n 



X-jx 


(n—1) S 2 
(n— 1) cr 2 


t(n — 1 ) 


(by Theorem 14.6). 


This completes the proof of the theorem. 

Example 14.8. Let Ai,A 2 ,A 3 , A4 be a random sample of size 4 from a 
standard normal distribution. If the statistic W is given by 

w- *-*» + *> 

jxl + xl + xl + x? 

then what is the expected value of W ? 

Answer: Since X, ~ iV(0,1), we get 


Ai - A 2 + A 3 ~ N(0, 3) 


and 


Further, since A,; 


Ar - A 2 + A 3 

73 

N( 0,1), we have 


N( 0,1). 


* 2 ~x 2 (i) 
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and hence 


Thus, 


xf + X 2 + X 2 + Xl ~ x 2 (4) 


x 1 -x 2 +x 3 , 

^ = ( ~r ) W ~ i(4). 

■^1 + A 'I +- Y 3 +- Y I \ V / 3y 


Now using the distribution of W, we find the expected value of W. 

e\w]=(^y 

-K 

■a 

= o. 





LV3 J 


E\m 


Example 14.9. If X ~ iV(0, 1) and X\, X^ is random sample of size two from 
the population X , then what is the 75 th percentile of the statistic W = -4==? 


Answer: Since each X, t ~ jV(0,1), we have 

X\ ~ N(0, 1) 


Hence 


W= X%= ~i(l). 


V^f 


The 75 th percentile a of IT is then given by 

0.75 = P(W < a) 
Hence, from the f-table, we get 


a = 1.0 

Hence the 75 th percentile of W is 1.0. 

Example 14.10. Suppose Xi, X 2 ,...., X n is a random sample from a normal 
distribution with mean p, and variance er 2 . If X = ^ Xi and V 2 = 
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1 n 

(n - 1) S 2 = (n - 1) £(* - X) 2 

n 

= '£(X i -X) 2 

= nV 2 . 



Thus 

( F^\ x-x n+l 

n+1) V 
Thus by comparison, we get 
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14.3. Snedecor’s F-distribution 


The next sampling distribution to be discussed in this chapter is 
Snedecor’s F-distribution. This distribution has many applications in math¬ 
ematical statistics. In the analysis of variance, this distribution is used to 
develop the technique for testing the equalities of sample means. 

Definition 14.3. A continuous random variable X is said to have a F- 
distribution with zq and v 2 degrees of freedom if its probability density func¬ 
tion is of the form 


! r (n+>±2) ( -l) 2 ^--i 

1 j y 2 ’ , ri+ ^ if 0 < a: < oo 

2 ’ 

0 otherwise, 

where zq, v 2 > 0. If X has a F-distribution with zq and v 2 degrees of freedom, 
then we denote it by writing X ~ F (iq, zq). 

The F-distribution was named in honor of Sir Ronald Fisher by George 
Snedecor. F-distribution arises as the distribution of a ratio of variances. 
Like, the other two distributions this distribution also tends to normal dis¬ 
tribution as zq and v 2 become very large. The following figure illustrates the 
shape of the graph of this distribution for various degrees of freedom. 



The following theorem gives us the mean and variance of Snedecor’s F- 
distribution. 

Theorem 14.8. If the random variable X ~ F(zq,zq), then 

f 77^ ^ ^2 > 3 


E[X\ = 


DNE if z/ 2 = 1,2 


Var[X} = 


2vl (i'i+^'2 — 2) 
VI (l2 2 2) 2 (1/2-4) 

DNE 


if v 2 A 5 

if i/a = 1,2,3,4. 
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Here DNE means does not exist. 

Example 14.11. If X ~ J 7 '(9,10), what P(X > 3.02) ? Also, find the mean 
and variance of X. 

Answer: 

P(X > 3.02) = 1 - P(X < 3.02) 

= 1-P(F(9,10) < 3.02) 

= 1 — 0.95 (from F — table) 


= 0.05. 

Next, we determine the mean and variance of X using the Theorem 14.8. 

^ = P = 1.25 


Hence, 

E(X) = 


v 2 — 2 

and 

Var(X) = - 


2 {v\ + V2 - 2 ) 

V-L {v 2 - 2) 2 (i/ 2 - 4) 


2(10) 2 (19-2) 

9(8) 2 (6) 


(25) (17) 
(27) (16) 


425 

432 


0.9838. 


Theorem 14.9. If X ~ F{v i, ^2), then the random variable -y ~ F(v 2 , 1^1 )• 

This theorem is very useful for computing probabilities like P(X < 
0.2439). If you look at a F-table, you will notice that the table start with val¬ 
ues bigger than 1. Our next example illustrates how to find such probabilities 
using Theorem 14.9. 

Example 14.12. If X ~ F(6,9), what is the probability that X is less than 
or equal to 0.2439 ? 
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Answer: We use the above theorem to compute 

P ( x<o.24 39 ) = r(I> iI 4 j ) 

= P (p( 9,6) > 0 2 ^ 3Q ^ (by Theorem 14.9) 

= 1 - P (F(9, 6) < — 1 —) 

V ' _ 0.2439 ) 

= 1 -P{F{ 9,6) < 4.10) 

= 1 - 0.95 
= 0.05. 

The following theorem says that F-distribution arises as the distribution 
of a random variable which is the quotient of two independently distributed 
chi-square random variables, each of which is divided by its degrees of free¬ 
dom. 

Theorem 14.10. If U ~ X 2 (^i) and V ~ X' 2 ^), and the random variables 
U and V are independent, then the random variable 

u_ 

yt ~ F(i/i,z/ 2 ) • 


Example 14.13. Let X\ , X 2 ,..., X 4 and Yi, Y 2 , ..., I5 be two random samples 
of size 4 and 5 respectively, from a standard normal population. What is the 
variance of the statistic T = (|) ? 

Answer: Since the population is standard normal, we get 
X\ + X\ + X \ 2 + X\ ~ x 2 (4). 

Similarly, 

F 2 + K > 2 + F 3 2 + F 4 2 + K 2 ~ x 2 (5). 


T= / 5 \ Xl + X^ + Xl + Xj ^ 


Yf+Yi+Yf+Yf+Y* 

5 

= T ~ F(4,5). 


Thus 
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Therefore 

Var{T ) = Var[F( 4,5)] 
_ 2(5) 2 (7) 

4 (3)M1) 

_ 350 
" 36 
= 9.72. 


Theorem 14.11. Let X ~ N(/i i, af) and Xi,X 2 , ■■■ 1 X n be a random sam¬ 
ple of size n from the population X. Let Y ~ _/V(/i 2 , ct|) and Y), >2, ■ ■■,Y m 
be a random sample of size m from the population Y. Then the statistic 

Si 

-fl- ~ F(n- 1, m- 1), 

4 

where S'f and S 2 denote the sample variances of the first and the second 
sample, respectively. 

Proof: Since, 

Xi ~ 

we have by Theorem 14.3, we get 

(n — 1) % ~ X 2 ( n ~ !)• 

°l 


Similarly, since 


Yi-Ni^al) 


we have by Theorem 14.3, we get 


(m 



X 2 (m- 1). 


Therefore 


sl 

l 1 

53 


(»-l) 5 t 

(ra-1) 

(m—1) 5| 

(m—1) ^2 


F(?r — 1, m — 1). 


This completes the proof of the theorem. 


Because of this theorem, the F-distribution is also known as the variance- 
ratio distribution. 
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14.4. Review Exercises 


1 . Let Xi, X' 2 ,..., A 5 be a random sample of size 5 from a normal distribution 
with mean zero and standard deviation 2. Find the sampling distribution of 
the statistic X\ + 2A 2 — A 3 + A 4 + A 5 . 

2 . Let Xi,X 2 ,X^ be a random sample of size 3 from a standard normal 
distribution. Find the distribution of A 2 + X 2 + A'f. If possible, find the 
sampling distribution of A 2 — Af. If not, justify why you can not determine 
it’s distribution. 


3. Let Xi,X 2 , ...,Xq be a random sample of size 6 from a standard normal 
distribution. Find the sampling distribution of the statistics -4 ===l 

6 V x t+ x l + x2 6 


and 


-Yl— A~2 — A~3 

\/ x I+ x i+ x e 


4. Let Xi , X 2 , A 3 be a random sample of size 3 from an exponential distri¬ 
bution with a parameter 9 > 0. Find the distribution of the sample (that is 
the joint distribution of the random variables Ai, A 2 , A 3 ). 


5. Let Ai, A 2 ,..., X n be a random sample of size n from a normal population 
with mean /i and variance cr 2 > 0. What is the expected value of the sample 
variance S' 2 = ^ Yh=i ( x i ~ x ) 2 ? 

6 . Let Ai, X 2 , A 3 , A 4 be a random sample of size 4 from a standard normal 

population. Find the distribution of the statistic 'Y 1 j~ A ' 4 , . 

V- Y 2 + X 3 

7. Let Ai, X 2 , A 3 , A 4 be a random sample of size 4 from a standard normal 
population. Find the sampling distribution (if possible) and moment gener¬ 
ating function of the statistic 2 X 2 + 3 A| + Af + 4A|. What is the probability 
distribution of the sample? 


8 . Let A equal the maximal oxygen intake of a human on a treadmill, where 
the measurement are in milliliters of oxygen per minute per kilogram of 
weight. Assume that for a particular population the mean of A is /i = 54.03 
and the standard deviation is a = 5.8. Let A be the sample mean of a random 
sample Ai, X 2 , ..., A 47 of size 47 drawn from A. Find the probability that 
the sample mean is between 52.761 and 54.453. 

9. Let Ai, A 2 ,..., X n be a random sample from a normal distribution with 

mean /i and variance a 2 . What is the variance of V 2 = - (A, — X)~ ? 

10 . If A is a random variable with mean /i and variance cr 2 , then /i — 2cr is 
called the lower 2<r point of A. Suppose a random sample Ai, A 2 , A 3 , A 4 is 



Sampling Distributions Associated with the Normal Population 


410 


drawn from a chi-square distribution with two degrees of freedom. What is 
the lower 2 a point of X 1 + X 2 + X 3 + X 4 ? 

11 . Let X and Y be independent normal random variables such that the 
mean and variance of X are 2 and 4, respectively, while the mean and vari¬ 
ance of Y are 6 and k, respectively. A sample of size 4 is taken from the 
X-distribution and a sample of size 9 is taken from the ^-distribution. If 
P (Y — X > 8 ) = 0.0228, then what is the value of the constant k ? 

12. Let X-|, X -2 ,..., X n be a random sample of size n from a distribution with 
density function 


f(x; A) 


Xe Xx if 0 < x < oo 

0 otherwise. 


What is the distribution of the statistic Y = 2A J2i= i X% ? 


13. Suppose X has a normal distribution with mean 0 and variance 1, Y 
has a chi-square distribution with n degrees of freedom, W has a chi-square 


distribution with p degrees of freedom, and W, X, and Y are independent. 


What is the sampling distribution of the statistic V = 



14. A random sample X - t , X 2 ,..., X n of size n is selected from a normal 


population with mean /i and standard deviation 1. Later an additional in¬ 


dependent observation X n+ i is obtained from the same population. What 
is the distribution of the statistic (X n +i — p ) 2 + — X) 2 , where X 

denote the sample mean? 


15. Let T = where X, Y, Z , and W are independent normal 

random variables with mean 0 and variance a 2 > 0. For exactly one value 
of k, T has a t-distribution. If r denotes the degrees of freedom of that 
distribution, then what is the value of the pair (k,r )? 


16. Let X and Y be joint normal random variables with common mean 0, 
common variance 1, and covariance What is the probability of the event 
(X + Y < V3), that is P (X + Y < V3)? 

17. Suppose Xj = Zj — Zj- 1 , where j = l,2,...,n and Zq, Z\, ..., Z n are 
independent and identically distributed with common variance a 2 . What is 
the variance of the random variable -F" , X',- ? 

18. A random sample of size 5 is taken from a normal distribution with mean 
0 and standard deviation 2. Find the constant k such that 0.05 is equal to the 
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probability that the sum of the squares of the sample observations exceeds 
the constant k. 

19. Let Xi, X 2 ,..., X n and Y\,Y 2 ,...,Y n be two random sample from the 
independent normal distributions with Var\Xj\ = cr 2 and Var\Yi] = 2a 2 , for 
* = 1,2, n and a 2 > 0. If U = E?=i (*i " X? and V = E?=i - Y)\ 
then what is the sampling distribution of the statistic 2 ^ v ? 

20. Suppose Xi,X 2 , ...,X e and Y\, Y 2 , ...., Yg are independent, identically 
distributed normal random variables, each with mean zero and variance cr 2 > 


0. What is the 95 th percentile of the statistics W = 


/ 

1 

Sr 

1_ 


j=i 


L,=i J 


21. Let Xi,X 2 , ...,X 6 and Y 1 ,Y 2 ,...,Y 8 be independent random sam¬ 
ples from a normal distribution with mean 0 and variance 1, and Z = 


6 


r 8 1 

4 E ^ 2 

i= 1 

/ 

1 

<M_. 

CO 

_ 1 


22. Give a proof of Theorem 14.9. 
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Chapter 15 

SOME TECHNIQUES 

FOR FINDING 
POINT ESTIMATORS 
OF 

PARAMETERS 


A statistical population consists of all the measurements of interest in 
a statistical investigation. Usually a population is described by a random 
variable X. If we can gain some knowledge about the probability density 
function f(x: 8) of X , then we also gain some knowledge about the population 
under investigation. 

A sample is a portion of the population usually chosen by method of 
random sampling and as such it is a set of random variables X -\, X- 2 ,.... X n 
with the same probability density function f(x; 8) as the population. Once 
the sampling is done, we get 

Xi = X\ , X 2 = X2, * * *, X n = x n 

where x\, X 2 , x n are the sample data. 

Every statistical method employs a random sample to gain information 
about the population. Since the population is characterized by the proba¬ 
bility density function f(x\9), in statistics one makes statistical inferences 
about the population distribution f(x; 8) based on sample information. A 
statistical inference is a statement based on sample information about the 
population. There are three types of statistical inferences (1) estimation (2) 
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hypothesis testing and (3) prediction. The goal of this chapter is to examine 
some well known point estimation methods. 

In point estimation, we try to find the parameter 9 of the population 
distribution f(x: 9) from the sample information. Thus, in the parametric 
point estimation one assumes the functional form of the pdf f(x\ 9) to be 
known and only estimate the unknown parameter 9 of the population using 
information available from the sample. 

Definition 15.1. Let X be a population with the density function f(x\9 ), 
where 9 is an unknown parameter. The set of all admissible values of 9 is 
called a parameter space and it is denoted by f2, that is 

12 = {# GlR™ | f(x\ 9) is a pdf } 

for some natural number m. 

Example 15.1. If X ~ EXP(9), then what is the parameter space of 9 ? 
Answer: Since X ~ EXP(9), the density function of X is given by 

f(x; 0 ) = l e ~ § - 

If 9 is zero or negative then f(x; 9) is not a density function. Thus, the 
admissible values of 9 are all the positive real numbers. Hence 

12 = {9 el | 0 < 9 < oo} 

= »+• 

Example 15.2. If X ~ N (/i,er 2 ), what is the parameter space? 

Answer: The parameter space 12 is given by 

= {(9 el 2 | f(x;9)~N( f i 1 a 2 )} 

= { (fj,, a) G M 2 | — oo < (j, < oo, 0 < a < oo } 

= lx Jt + 

= upper half plane. 

In general, a parameter space is a subset of ® m . Statistics concerns 
with the estimation of the unknown parameter 9 from a random sample 
Xi, X 2 ,..., X n . Recall that a statistic is a function of Xi, X 2 ,..., X n and free 
of the population parameter 9. 
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Definition 15.2. Let X ~ /(&; 0) and Xi,X 2 , i-^X n be a random sample 
from the population X. Any statistic that can be used to guess the parameter 
0 is called an estimator of 0. The numerical value of this statistic is called 
an estimate of 9. The estimator of the parameter 9 is denoted by 9. 

One of the basic problems is how to find an estimator of population 
parameter 9. There are several methods for finding an estimator of 9. Some 
of these methods are: 

(1) Moment Method 

(2) Maximum Likelihood Method 

(3) Bayes Method 

(4) Least Squares Method 

(5) Minimum Chi-Squares Method 

(6) Minimum Distance Method 

In this chapter, we only discuss the first three methods of estimating a 
population parameter. 

15.1. Moment Method 

Let Xi, X 2 , ..., X„ be a random sample from a population X with proba¬ 
bility density function f(x; 9±,9 2 , ■ 9 m ), where 9i, 02, •••, 0 m are m unknown 

parameters. Let 

/ OO 

x k f(x-,e 1 ,e 2 ,...,e m )dx 

-00 

be the k th population moment about 0. Further, let 

i= 1 

be the k th sample moment about 0. 

In moment method, we find the estimator for the parameters 0i, 02,..., 6 m 
by equating the first m population moments (if they exist) to the first m 
sample moments, that is 

E (X) = M x 

e(x 2 ) = m 2 

E (X 3 ) = M 3 
E (X m ) = M m 
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The moment method is one of the classical methods for estimating pa¬ 
rameters and motivation comes from the fact that the sample moments are 
in some sense estimates for the population moments. The moment method 
was first discovered by British statistician Karl Pearson in 1902. Now we 
provide some examples to illustrate this method. 

Example 15.3. Let A ~ N (/. i,a 2 ) and Ai, A 2 , ■■■,X n be a random sample 
of size n from the population X. What are the estimators of the population 
parameters p, and er 2 if we use the moment method? 

Answer: Since the population is normal, that is 

A ~ IV (/x, a 2 ) 


we know that 


Hence 


E{X)=n 
E(X 2 ) =<j 2 + li 2 . 

V = E(X) 

= M x 


i n 

n 


i=1 

= X. 

Therefore, the estimator of the parameter /i is A, that is 

fi = A. 

Next, we find the estimator of cr 2 equating E( A 2 ) to M 2 . Note that 

2 2,2 2 

a = o + n — n 


= E (A 2 ) - n 2 
= M 2 - f 


—2 


= - Y J x i~ x 

n 1 


^2 




i= 1 


The last line follows from the fact that 
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n 1 n 


- 2 Xi X + JT 


2=1 


2 =1 


1 n 1 n 1 n 

= 1 Vxf-- V2X.X+ 1 VI 2 

n ' 77, ' n ^' 


2=1 


2=1 


n n 

= ;E x ?- 25 r ;E x + x 


2=1 

2 


2=1 


2=1 

—2 


= - ^Xf-2XX + X 


= -yi,^r. 

n 

2=1 

n 

Thus, the estimator of a 2 is ^ (JQ — X) , that is 

2=1 

_ n 


i=l 


Example 15.4. Let X-|, X- 2 ,..., X n be a random sample of size n from a 
population X with probability density function 


f{x;0) 


9x e 1 if 0 < a: < 1 


[ 0 otherwise, 

where 0 < 9 < oo is an unknown parameter. Using the method of moment 
find an estimator of 9 ? If X\ = 0.2, X 2 = 0.6, X 3 = 0.5, X\ = 0.3 is a random 
sample of size 4, then what is the estimate of 9 1 


Answer: To find an estimator, we shall equate the population moment to 
the sample moment. The population moment E(X) is given by 

E(X) = f x /(®; 0) dx 

Jo 

1 

x9 x 8 ^ 1 dx 



= 9 x dx 


Jo 

9 

0 T 1 

9 

0 + 1 ' 


x 9+1 


1 

0 



Some Techniques for finding Point Estimators of Parameters 


418 


We know that Mi = X. Now setting Mi equal to E(X) and solving for 9, 
we get 


that is 




1-X 


where X is the sample mean. Thus, the statistic ~^= is an estimator of the 

y 1 —x 


parameter 9. Hence 


1-X 


Since Xi = 0.2, X 2 = 0.6, X 3 = 0.5, X 4 = 0.3, we have X = 0.4 and 


1-0.4 3 


is an estimate of the 9. 


Example 15.5. What is the basic principle of the moment method? 

Answer: To choose a value for the unknown population parameter for which 
the observed data have the same moments as the population. 

Example 15.6. Suppose Xi,X 2 ,..., X 7 is a random sample from a popula¬ 
tion X with density function 


fix-,0) = l r(7)/3 ' 


if 0 < x < 00 


^ 0 otherwise. 

Find an estimator of j3 by the moment method. 

Answer: Since, we have only one parameter, we need to compute only the 
first population moment E(X) about 0. Thus, 


/•OO 

E(X) = x f(x; 0) dx 

Jo 


00 x e e 0 

X T(TW dX 


r(7) Jo \0 


00 fA 7 -• , 

— e ? dx 


y 7 e y dy 


= ' J F(7) r(8) 
= 7/3. 
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Since M\ = X, equating E(X) to Mi, we get 


7/3 = X 


that is 




Therefore, the estimator of /3 by the moment method is given by 


P= \x. 


Example 15.7. Suppose Xi,X 2 ,..., X n is a random sample from a popula¬ 
tion X with density function 


f{x; 0) 


i 

9 


if 0 < x <9 


v 0 otherwise. 

Find an estimator of 6 by the moment method. 

Answer: Examining the density function of the population X , we see that 
X ~ UNIF(0,9). Therefore 


EiX) = 

Now, equating this population moment to the sample moment, we obtain 

\ = E{ X) = M x = X. 

Therefore, the estimator of 9 is 


9 = 2X. 


Example 15.8. Suppose X\,X^, ■■■,X n is a random sample from a popula¬ 
tion X with density function 


f{x\a,0) 


1 

(3—a. 


o 


if a < x < /3 
otherwise. 


Find the estimators of a and /3 by the moment method. 
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Answer: Examining the density function of the population X , we see that 
X ~ UNIF(a,f3). Since, the distribution has two unknown parameters, we 
need the first two population moments. Therefore 


E(X) = 


a. (3 


and E(X 2 ) = 


(P-a) 2 


E(X) 2 . 


2 v ' 12 
Equating these moments to the corresponding sample moments, we obtain 

(x -\~ P 


that is 

and 

which is 


= E(X) = Mi = X 


a + p=2X 


(P - a ) 2 

12 


E{X) 2 = E(X 2 ) = M 2 = - V X 2 

71 < 


(P - a) 2 = 12 


= 12 


= 12 


-Y,Xf-E(Xf 


-yxf-x 2 

n Z —< 1 


E «-*) 2 


Hence, we get 


\ 




i= 1 


P — a = 

Adding equation (1) to equation (2), we obtain 
2/3 = 2A ± 2 

that is 




\ 




\ 


^2 


iE«-y> 


i=l 


p = x± 

Similarly, subtracting (2) from (1), we get 

n 

« = - EW- 1 !’ 


\ 


n 


(i) 


( 2 ) 
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Since, a < (3, we see that the estimators of a and /? are 


a = X- 



E « - T 2 

i=l 


and 


d=x+ 


N 


3 

n 


E( x ?-E 2 - 

i =1 


15.2. Maximum Likelihood Method 

The maximum likelihood method was first used by Sir Ronald Fisher 
in 1922 (see Fisher (1922)) for finding estimator of a unknown parameter. 
However, the method originated in the works of Gauss and Bernoulli. Next, 
we describe the method in detail. 

Definition 15.3. Let X -\, X-i ,..., X n be a random sample from a population 
X with probability density function f(x;9), where 9 is an unknown param¬ 
eter. The likelihood function, L(9), is the distribution of the sample. That 
is 

n 

i=1 

This definition says that the likelihood function of a random sample 
X -\, X‘ 2 ,..., X n is the joint density of the random variables X \, X 2 ,..., X n . 

The 9 that maximizes the likelihood function L(9) is called the maximum 
likelihood estimator of 9 , and it is denoted by 9. Hence 

9 = Arg sup L(9), 

9efi 

where Q is the parameter space of 9 so that L(9) is the joint density of the 
sample. 

The method of maximum likelihood in a sense picks out of all the possi¬ 
ble values of 9 the one most likely to have produced the given observations 
X\,X 2 ,..., x n . The method is summarized below: (1) Obtain a random sample 
x\, X 2 ,..., x n from the distribution of a population X with probability density 
function f(x; 9)-, (2) define the likelihood function for the sample Xi, x %,..., x n 
by L{9) = f(x i; 9)f(x 2 ; 9) ■ ■ ■ f(x n ; 9)\ (3) find the expression for 9 that max¬ 
imizes L(9). This can be done directly or by maximizing lnL(6 | ); (4) replace 
9 by 9 to obtain an expression for the maximum likelihood estimator for 9\ 
(5) find the observed value of this estimator for a given sample. 
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Example 15.9. If X\, X 2 ,..., X n is a random sample from a distribution 
with density function 


f(x] 0) = 


(1 — 9) x 9 if 0 < x < 1 


elsewhere, 


what is the maximum likelihood estimator of 9 ? 

Answer: The likelihood function of the sample is given by 




Therefore 


In L(0) = In n 


= £ln f(xi-,6) 

i=1 
n 

= £ ln [(! - 0) Xi~ 9 ] 

i— 1 

n 

= n ln(l — 9) — 6 In: 


Now we maximize In L{9) with respect to 9. 


d\nL(6) d f W1 m /1 ^ 1 

-^9~ = Te h ln(1 -^-^£ ln ^ 

\ i= 1 

n 

i— 1 

Setting this derivative to 0, we get 


dlnL(9) n 


— In Xi — 0 


that is 


n 

-2_ = -i Vim 
1 — 0 n 
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or 

l i n _ 

--- =-y^lnx, = -In a;. 

1 — 0 n 

i=l 

or 

0=1 + =. 

Inx 

This 0 can be shown to be maximum by the second derivative test and we 
leave this verification to the reader. Therefore, the estimator of 0 is 


1 


In X 


Example 15.10. If X\, X 2 , ..., X n is a random sample from a distribution 
with density function 

/(*;/3) = | r(7)/37 lf0<a: <oo 

l 0 otherwise, 

then what is the maximum likelihood estimator of (3 ? 

Answer: The likelihood function of the sample is given by 

n 

£(/?)=n /(*»£)■ 

i= 1 


Thus, 

n 

In L(f3) = X^ln f( x i,P) 

i=1 

n 1 n 

= 6 fj In Xj —- — nln(6!) — 7nln(/3). 

2—1 P 2—1 


Therefore 


Setting this derivative 


7 1 

jp In L(/3) to zero, we get 


if _7n 

/12 p -° 


P 


i=l 


7 n 

J' 


P = 


1 

7n 


J2 Xi - 

i= 1 


which yields 
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This (3 can be shown to be maximum by the second derivative test and again 
we leave this verification to the reader. Hence, the estimator of f3 is given by 


Remark 15.1. Note that this maximum likelihood estimator of (3 is same 
as the one found for (3 using the moment method in Example 15.6. However, 
in general the estimators by different methods are different as the following 
example illustrates. 

Example 15.11. If X\, X 2 ,..., X n is a random sample from a distribution 
with density function 


/M) = 



if 0 < x <9 
otherwise, 


then what is the maximum likelihood estimator of 9 1 
Answer: The likelihood function of the sample is given by 


m = Yif(xi-,9) 

»=1 



9 > Xi ( i = 1 , 2 , 3 ,...,.») 
9 > max{xi, X 2 ,..., x n }. 


Hence the parameter space of 9 with respect to L{9) is given by 


O {11 £ JI | X max 9 <C OO } — (^maxj 00). 

Now we maximize L{9) on Cl. First, we compute In L{9) and then differentiate 
it to get 

In L(9) = -nln(0) 

and 

Therefore In L(9) is a decreasing function of 9 and as such the maximum of 
In L(0) occurs at the left end point of the interval (a: ma x, 00 ). Therefore, at 
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9 = a; max the likelihood function achieve maximum. Hence the likelihood 
estimator of 9 is given by 


= X, 


(n) 


where X( n ) denotes the n th order statistic of the given sample. 

Thus, Example 15.7 and Example 15.11 say that the if we estimate the 
parameter 9 of a distribution with uniform density on the interval (0, 9), then 
the maximum likelihood estimator is given by 


e = x {n) 


where as 

9 = 2X 


is the estimator obtained by the method of moment. Hence, in general these 
two methods do not provide the same estimator of an unknown parameter. 

Example 15.12. Let Xi, X 2 , ..., X n be a random sample from a distribution 
with density function 

f \F^- e “2 ( x ~ e ) 2 if x > 9 

f(x;9) = l 

10 elsewhere. 

What is the maximum likelihood estimator of 9 ? 


Answer: The likelihood function L(9) is given by 



Hence the parameter space of 9 is given by 


1,2,3,..., n). 


fi = {0€R|O<0< a: min } = [0, x min ],, 


where x m ; n = min{xi, X 2 &.., x n }. Now we evaluate the logarithm of the 
likelihood function. 

In 1,(0) = \ ln - \ - 0 ) 2 ’ 

where 0 is on the interval [0, a; m i n ]. Now we maximize lnL(0) subject to the 
condition 0 < 9 < a: m i n . Taking the derivative, we get 

, 1 n 71 
- ln m = -- - 0) 2(—1) = - 0). 
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In this example, if we equate the derivative to zero, then we get 9 ~x. But 
this value of 9 is not on the parameter space f2. Thus, 9 = x is not the 
solution. Hence to find the solution of this optimization process, we examine 
the behavior of the In L(9) on the interval [0, £C m i n ]. Note that 

, 1 n n 
Te In m = -- - 9) 2(—1) = 5>i - 9) > 0 


i=1 


i=l 


since each Xi is bigger than 9. Therefore, the function In L(9) is an increasing 
function on the interval [0, x m ; n ] and as such it will achieve maximum at the 
right end point of the interval [0, x rn j n ]. Therefore, the maximum likelihood 
estimator of 9 is given by 

where X(- tj denotes the smallest observation in the random sample 
X 1 ,X 2 ,...,X n . 

Example 15.13. Let X -\, X 2 ,X n be a random sample from a normal 
population with mean /i and variance a 2 . What are the maximum likelihood 
estimators of n and a 2 ? 

Answer: Since X ~ A r (/j, a 2 ), the probability density function of X is given 
by 

j., \ 1 x ~>‘ H 

= - y=e - > 


_ £| x — u\ 2 
_ P 2 V ' 

a \Z2n 

The likelihood function of the sample is given by 


LM = fl^ /r e-iC-^Y 

1 G V Z7T 

2=1 


Hence, the logarithm of this likelihood function is given by 

1 


In L(n,a) = -- ln(27r) - nln(cr) - ^ 




i=1 


Taking the partial derivatives of In L(/n, a) with respect to p. and a, we get 
d_. 

d/j, 


d 1 l 

— inL(n,a) =- 7^2 (-2)= 


and 


lnL(/x, a) = -- + -4 - 9) 2 - 

OG G G 6 

2=1 
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Setting In a) = 0 and In L(/z, a) = 0, and solving for the unknown 
fi and <7, we get 


M = 


1 

n 


n 

^2,Xi = X. 

i =1 


Thus the maximum likelihood estimator of /i is 


fi = X. 


Similarly, we get 


n 

a 


1 


^2(xi - /i) 2 = 0 

i=1 


implies 

ff2= v) 2 - 

i=l 

Again /.( and a 2 found by the first derivative test can be shown to be maximum 
using the second derivative test for the functions of two variables. Hence, 
using the estimator of /i in the above expression, we get the estimator of a 2 
to be 

— 1 ” 

» 2 = - B-v - v) 2 . 


Example 15.14. Suppose Xi, X 2l ■■■,X. n is a random sample from a distri¬ 
bution with density function 


f(x;a,P ) 


1 

(3—a 


o 


if a < x < ft 
otherwise. 


Find the estimators of a and /3 by the method of maximum likelihood. 
Answer: The likelihood function of the sample is given by 


L(a,0) 


n 


i 

P — a 


1 

P — a 


n 


for all a < Xi for (i = 1,2,..., n) and for all p > Xi for (i = 1,2,..., n). Hence, 
the domain of the likelihood function is 


n = {(a,P) | 0 < a < £( 1 ) and X( n ) < P < oo}. 
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It is easy to see that L(a, (3) is maximum if a = arm and (3 = x( n y Therefore, 
the maximum likelihood estimator of a and [3 are 


a = X< 


(i) 


and 


(3 = X {n) . 


The maximum likelihood estimator 9 of a parameter 9 has a remarkable 
property known as the invariance property. This invariance property says 
that if 9 is a maximum likelihood estimator of 9, then g(9) is the maximum 
likelihood estimator of g(9), where g is a function from® fe to a subset of®” 1 . 
This result was proved by Zehna in 1966. We state this result as a theorem 
without a proof. 

Theorem 15.1. Let 9 be a maximum likelihood estimator of a parameter 9 
and let g(9) be a function of 9. Then the maximum likelihood estimator of 
g(9) is given by g ((t). 

Now we give two examples to illustrate the importance of this theorem. 

Example 15.15. Let X\, X 2 , ..., X n be a random sample from a normal 
population with mean /i and variance a 2 . What are the maximum likelihood 
estimators of a and /x — <r? 

Answer: From Example 15.13, we have the maximum likelihood estimator 
of /x and a 2 to be 

ji = x 


and 


= - E( x * - x ) 2 = : s2 ( sa y)- 


Now using the invariance property of the maximum likelihood estimator we 
have 

a = E 


and 


/i — a = X — E. 


Example 15.16. Suppose Xx,X 2 , ...,X n is a random sample from a distri¬ 
bution with density function 

' ii a < x < (3 

f{x-,a,(3) = 

0 otherwise. 

Find the estimator of \J 0 ? + (3 2 by the method of maximum likelihood. 
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Answer: From Example 15.14, we have the maximum likelihood estimator 
of a and fit to be 

S = X (1) and /3 = A (n) , 

respectively. Now using the invariance property of the maximum likelihood 
estimator we see that the maximum likelihood estimator of \J a 2 + (3 2 is 

\l x m + x U- 

The concept of information in statistics was introduced by Sir Ronald 
Fisher, and it is known as Fisher information. 

Definition 15.4. Let X be an observation from a population with proba¬ 
bility density function f(x;0). Suppose f(x;9) is continuous, twice differen¬ 
tiable and it’s support does not depend on 9. Then the Fisher information, 
1(9), in a single observation X about 9 is given by 


m = L 


/•OO 

d In f(x; 9) 

/ —OO 

d9 


f(x ; 9) dx. 


Thus 1(9) is the expected value of the square of the random variable 
d -^^. That is, 


1(9) = E 


d hi /(X; 6 ) 
d9 




In the following lemma, we give an alternative formula for the Fisher 
information. 


Lemma 15.1. The Fisher information contained in a single observation 
about the unknown parameter 9 can be given alternatively as 




POO 

df 2 In f(x\ 9) 

' —OO 

d6 2 


f(x ; 9) dx. 


Proof: Since f(x\ 9) is a probability density function, 

/ OO 

f(x; 9) dx = 1. 

-OO 

Differentiating (3) with respect to 9, we get 

d 

- j j(x ] e)dx = o. 


( 3 ) 
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Rewriting the last equality, we obtain 


df(x ; 9) 1 

dd /(x; 6) 


/(x; 9)dx = 0 


which is 


dln/(x; 9) 


/(x; 9) dx = 0. 


Now differentiating (4) with respect to 9 , we see that 

r \<P !»/(*;») „ , dinf( X] e) df(x-,e ) 

/_„ A*; 9) + —^- dd 


dx = 0. 


Rewriting the last equality, we have 

/: [^r 1 a- «>+*>] - - « 


which is 


d 2 In /(x; 9) \d In /(x; 9) 


/(x; 9) dx = 0. 


The last equality implies that 


d In /(x; 9) 
d9 


f(x; 9) dx= - J 


d 2 In f(x; 9) 
d9 2 


/(x; 9) dx. 


Hence using the definition of Fisher information, we have 


m = - 


r°° [" d 2 In /(x; 9) 
dff 2 


/(x; 9) dx 


and the proof of the lemma is now complete. 

The following two examples illustrate how one can determine Fisher in¬ 
formation. 

Example 15.17. Let X be a single observation taken from a normal pop¬ 
ulation with unknown mean /i and known variance a 2 . Find the Fisher 
information in a single observation X about /i. 

Answer: Since X ~ N(fi 1 a 2 ), the probability density of X is given by 
/0;/i) = 
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Hence 


Therefore 


and 


Hence 


In f{x;n) = -^ln(2 7 rcr 2 ) - ^^ 

d\nf(x,(i) x — (i 
d(A a 2 


d 2 In /(s; /x) = _ J_ 
d(A 2 a 2 

= f{x-,n)dx= ^ 2 - 


Example 15.18. Let Xi,A 2 ,,».,X„ be a random sample from a normal 
population with unknown mean (a and known variance er 2 . Find the Fisher 
information in this sample of size n about (A. 

Answer: Let J„(/x) be the required Fisher information. Then from the 
definition, we have 


4(m) = -e 

= -E 

= — E 


d 2 hr f(Xi, X 2 ,..., X n - n 

dfi 2 

d 2 
d(i 

d 2 In /(A i; n) 


2 {In /(AT i; /z) H-h In /(-XT„; m)} 


dfj, 2 

= I (fx) + ■ ■ ■ + I((a) 

= nl((i) 

1 


- E 


d 2 In f(X n ; (i) 

d(A 2 


= n ■ 


(using Example 15.17). 


This example shows that if X - t , X - 2 ,..., X n is a random sample from a 
population X ~ f(x\9 ), then the Fisher information, I n (9), in a sample of 
size n about the parameter 9 is equal to n times the Fisher information in X 
about 9. Thus 

I n (9) = nl(9). 


If A is a random variable with probability density function f(x; 9), where 
9 = (9 1 , ...,9 n ) is an unknown parameter vector then the Fisher information, 
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1(9), is a n x n matrix given by 

m = (iam 

d 2 In f(X-9) \ 

ddi d9j ) 

Example 15.19. Let X-\, X 2 ,.... X n be a random sample from a normal 
population with mean /r and variance a 2 . What is the Fisher information 
matrix, I n (/i, a 2 ), of the sample of size n about the parameters /./, and tr 2 ? 

Answer: Let us write 6\ = fi and 02 = <r 2 . The Fisher information, I n (9), 
in a sample of size n about the parameter (0i,0 2 ) is equal to n times the 
Fisher information in the population about (0i,02), that is 

i n (0i,e 2 ) = ni(e 1 ,e 2 ). (5) 



Since there are two parameters 9\ and 62 , the Fisher information matrix 
J(@i, 02 ) is a 2 x 2 matrix given by 

(hi(9i,e 2 ) h 2 (0i,02)\ 

m,02)=\ (6) 

V ^21 (01,02) 122(01,02) J 



d\nf(x; 61 , 62 ) x — 6 \ 

del = 62 ’ 

dlnf(x; 6 1 , 6 2 ) _ 1 (x - 0i) 2 

dd 2 2 d 2 + 2 0 2 ’ 

d 2 lnf(x-,6 1 ,0 2 ) _ 1 

06 2 ~ 9 2 ’ 

a 2 In f(x] 01 , 02 ) _ 1 (x- 61) 2 

oel ~ 2 0i 0f ’ 

3 2 In f(x\ 0 i, 02 ) x — 0i 
501 302 = 02 “' 
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Hence 


I n(0ii $ 2 ) — —E 




1 


a 


2 ' 


Similarly, 


i2i(0i,e 2 ) = i 12 ( 0 i,e 2 ) = -E 



x-oa 

o\ ) 


E(X) 6\ _ 6>i 6>i 

~eT~e 


and 


I22(6O2) — —E — 


(x-eA , 1 


ol 


261 


E((x-e 1 ) 2 ) 1 


01 


2 01 


01 


2 0 \ 


1 _ 1 
20% ~ 2ct4' 


Thus from (5), (6) and the above calculations, the Fisher information matrix 
is given by 


In(0\, 62) — n 


\0 \0 


Now we present an important theorem about the maximum likelihood 
estimator without a proof. 

Theorem 15.2. Under certain regularity conditions on the f(x\ 9) the max¬ 
imum likelihood estimator 6 of 9 based on a random sample of size n from 
a population X with probability density f(x ; 9) is asymptotically normally 
distributed with mean 9 and variance n j^ ■ That is 


9ml ~N[0 


nl{9) 


as n 


00. 


The following example shows that the maximum likelihood estimator of 
a parameter is not necessarily unique. 

Example 15.20. If Xi, X 2 , ■■■, X n is a random sample from a distribution 
with density function 


f{x; 0) 


| if 0 — 1 < x < 6 + 1 
0 otherwise, 


then what is the maximum likelihood estimator of 91 
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Answer: The likelihood function of this sample is given by 


m = 



0 


if max.{x \,..., x n } — 1 < 9 < min{a:i, x n } + 1 
otherwise. 


Since the likelihood function is a constant, any value in the interval 
[max{a:i, — 1, min{xi, ...,x n } + 1] is a maximum likelihood estimate 

of 9. 


Example 15.21. What is the basic principle of maximum likelihood esti¬ 
mation? 


Answer: To choose a value of the parameter for which the observed data 
have as high a probability or density as possible. In other words a maximum 
likelihood estimate is a parameter value under which the sample data have 
the highest probability. 

15.3. Bayesian Method 

In the classical approach, the parameter 9 is assumed to be an unknown, 
but fixed quantity. A random sample Xi, X 2 ,..., X n is drawn from a pop¬ 
ulation with probability density function f(x\ 9) and based on the observed 
values in the sample, knowledge about the value of 9 is obtained. 

In Bayesian approach 9 is considered to be a quantity whose variation can 
be described by a probability distribution (known as the prior distribution). 
This is a subjective distribution, based on the experimenter’s belief, and is 
formulated before the data are seen (and hence the name prior distribution). 
A sample is then taken from a population where 9 is a parameter and the 
prior distribution is updated with this sample information. This updated 
prior is called the posterior distribution. The updating is done with the help 
of Bayes’ theorem and hence the name Bayesian method. 

In this section, we shall denote the population density f(x; 9) as f(x/9), 
that is the density of the population X given the parameter 9. 

Definition 15.5. Let Xi, X 2 ,..., X n be a random sample from a distribution 
with density f(x/9), where 9 is the unknown parameter to be estimated. 
The probability density function of the random variable 9 is called the prior 
distribution of 9 and usually denoted by h{9). 

Definition 15.6. Let X\, X^i ..., X n be a random sample from a distribution 
with density f(x/9), where 9 is the unknown parameter to be estimated. The 
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conditional density, k(6/x\, x%, ■■■, x n ), of 9 given the sample x\, X 2 , ..., x n is 
called the posterior distribution of 9. 

Example 15.22. Let X\ = 1, X 2 = 2 be a random sample of size 2 from a 
distribution with probability density function 

f(x/9)=( 3 ^9 x (l~9) 3 ~ x , x = 0,1,2,3. 

If the prior density of 9 is 

( k if i < 0 < 1 

m = \ 

y 0 otherwise, 


what is the posterior distribution of 9 ? 

Answer: Since h(9) is the probability density of 9 , we should get 



h{9) d9 = 1 


which implies 



Therefore k = 2. The joint density of the sample and the parameter is given 
by 


u(xi,x 2 ,9) = f(xi/9)f(x 2 /9)h(9) 

= ^ 3 ^9 Xl (l-9) 3 ~ Xl f 3 ^j9 X2 (l - 9) 3 ~ X2 2 


= 2 



( ^ j QX\-\-X2 _ Q'j 6-X1—X2 

\X 2 ) 


«(l,2,fl)=2^) Q0 3(1 -0) 3 

= 180 3 (1 — 9) 3 . 


Hence, 
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The marginal distribution of the sample 


3(1,2) 



u(l, 2, 9) dO 



18# 3 (1 — 9) 3 cL9 


= 18 9 3 (l + 3 9 2 - 39 - 9 3 ) d9 

J 2 

= 18 j (9 3 + 3 9 5 - 3 9 4 - 9 6 ) d9 
9 

~ 140' 


The conditional distribution of the parameter 9 given the sample Xi = 1 and 
Xi = 2 is given by 


k{9/x\ 


l,x 2 = 2) = 


»(1,2 ,9) 
3(1,2) 

18 9 3 (1 — 9) 3 

_ 9 _ 

140 


= 280 6> 3 (1 -Of. 


Therefore, the posterior distribution of 9 is 


k{9/x\ = l,x 2 = 2) = 


280 $ 3 (1 — 9) 3 
0 


if 1 < 9 < 1 
otherwise. 


Remark 15.2. If X 1 , X 2 ,X n is a random sample from a population with 
density f{x/9 ), then the joint density of the sample and the parameter is 
given by 

n 

u(x 1 ,x 2 ,...,x n ,9) = h(9)Y[.f(xi/9). 

i= 1 

Given this joint density, the marginal density of the sample can be computed 
using the formula 

/»00 n 

g(x 1 ,x 2 ,:;X n ) = / h(9) \\f(xi/9) d9. 
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Now using the Bayes rule, the posterior distribution of 0 can be computed 
as follows: 


k(9/xi,x 2 , -,x n ) 


IZ o h(0)uUf(x i /e)de- 


In Bayesian method, we use two types of loss functions. 


Definition 15.7. Let Xi, X 2 ,..., X n be a random sample from a distribution 
with density f(x/9), where 6 is the unknown parameter to be estimated. Let 
9 be an estimator of 9. The function 

C 2 ( 0 , 0 ) = ( 0-9f 

is called the squared error loss. The function 


A (0,0) 

is called the absolute error loss. 


0-0 


The loss function C represents the ‘loss’ incurred when 9 is used in place 
of the parameter 9. 


Definition 15.8. Let X\, X 2 ,..., X n be a random sample from a distribution 
with density f(x/9), where 9 is the unknown parameter to be estimated. Let 
9 be an estimator of 9 and let C (j), 9^j be a given loss function. The expected 
value of this loss function with respect to the population distribution f(x/9), 
that is 

Rc(0)= f £(0,0) f{x/9)dx 

is called the risk. 

The posterior density of the parameter 9 given the sample aq, x 2 ,..., x n , 
that is 

k(9/x i,x 2 , 


contains all information about 9. In Bayesian estimation of parameter one 
chooses an estimate 9 for 9 such that 


k(9/xi,x 2 , ...,x n ) 


is maximum subject to a loss function. Mathematically, this is equivalent to 
minimizing the integral 


C 


(0,0) k(0/xi,x 2 ,..., x n ) d9 
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with respect to 9, where tt denotes the support of the prior density h{8) of 
the parameter 8. 


Example 15.23. Suppose one observation was taken of a random variable 
X which yielded the value 2. The density function for X is 

if 0 < x < 8 

fix/9) = l 

y 0 otherwise, 

and prior distribution for parameter 8 is 

(A if 1 < 0 < oo 

h{8) = 

^ 0 otherwise. 

If the loss function is C(z, 8) = (z — 9) 2 , then what is the Bayes’ estimate for 
8 ? 


Answer: The prior density of the random variable 8 is 

f Jr if 1 < 0 < oo 

m = 

< 0 otherwise. 

The probability density function of the population is 

\ if 0 < x < 9 

,f(x/8) = ' 


v 0 otherwise. 

Hence, the joint probability density function of the sample and the parameter 
is given by 


u{x,8) = h{8) f(x/8) 


3 1 


¥ 8 


f 3 8~ 5 

if 0 < x < 9 and 

lo 

otherwise. 


The marginal density of the sample is given by 



u(x, 9) dd 


3 0 -5 dd 



3 

4 a: 4 ' 


1 < 8 < oo 
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Thus, if x = 2, then g( 2) = 
given by 

k(9/x = 2) 


g 4 . The posterior density of 9 when x = 2 is 
u{2,6) 

g( 2) 




Now, we find the Bayes estimator 
E [jC(9, z)/x = 2], That is 


if 2 < 9 < oo 
otherwise . 

by minimizing the expression 


9 = Arq max 

2 efi 


£(0, z) k(9/x = 2) d0. 


Let us call this integral i/)(z). Then 


ip(z)= f C(9,z)k(9/x = 2)d9 
Jn 

POO 

= J (z — 9) 2 k(9/x = 2) d9 

/-•OO 

= j (z — 9) 2 649~ 5 d9. 


We want to find the value of 2 which yields a minimum of i/j(z). This can be 
done by taking the derivative of ijj(z) and evaluating where the derivative is 
zero. 


d rl r°° 

M z )=— / (z — 9) 2 649~ 5 d9 
ctz dz J 2 

/»oo 

= 2 J ( z- 9 ) 64 9~ 5 d9 


= 2 


z 646»" 5 d6-2 


9649~ 5 d9 


J 2 


„ 16 
T' 

Setting this derivative of ip(z) to zero and solving for z, we get 


2 z- 


16 

3 


= 0 


z = 


8 

3' 


Since - ] = 2, the function ip(z) has a minimum at z = Hence, the 
Bayes’ estimate of 9 is |. 
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In Example 15.23, we have found the Bayes’ estimate of 9 by di¬ 
rectly minimizing the f n £^9,9^j k(6/x\, x?,x n ) d6 with respect to 9. 
The next result is very useful while finding the Bayes’ estimate using 
a quadratic loss function. Notice that if £(9,9) = (9 — 9) 2 , then 
J n £^9,9 ^ k(9/x\,X2,---,x n )d9 is E ((0 — 6) 2 /xi,X 2 , -;Xn). The follow¬ 
ing theorem is based on the fact that the function cf> defined by </>(c) = 
E [(X — c) 2 ] attains minimum if c = E\X\. 

Theorem 15.3. Let X\,Xi, ■■■,X n be a random sample from a distribution 
with density f(x/9 ), where 9 is the unknown parameter to be estimated. If 
the loss function is squared error, then the Bayes’ estimator 9 of parameter 
9 is given by 

9 = E(9/x\,X 2 , x n ), 

where the expectation is taken with respect to density k(9/x\,X 2 , ...,x n ). 

Now we give several examples to illustrate the use of this theorem. 

Example 15.24. Suppose the prior distribution of 9 is uniform over the 
interval (0,1). Given 9, the population X is uniform over the interval (0,0). 
If the squared error loss function is used, find the Bayes’ estimator of 9 based 
on a sample of size one. 

Answer: The prior density of 9 is given by 


m = 



if 0 < 9 < 1 
otherwise . 


The density of population is given by 


f(x/9) = 


if 0 < x < 9 
otherwise. 


The joint density of the sample and the parameter is given by 


■u(x,9) = h(9) f(x/9) 

'V 


= 1 


ifO<a:<0<l 


0 


otherwise . 
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The marginal density of the sample is 


g{x) = f u(x, 9) dO 

J X 

-!> 


-{ 


— In a; 

0 


if 0 < x < 1 
otherwise. 


The conditional density of 9 given the sample is 


k(9/x) = 


u(x, 9) 

sO) 


_ 1 _ 

6 In q 


if 0 < x < 9 < 1 


v 0 elsewhere . 

Since the loss function is quadratic error, therefore the Bayes’ estimator of 9 


is 


9 = E[9/x\ 


= f 9 k(9/x) d9 

J X 


-1 


d9 


d9 


J x 9 In a; 

111 x 

x — 1 

111 X 

Thus, the Bayes’ estimator of 9 based on one observation X is 

In X 


Example 15.25. Given 0, the random variable X has a binomial distribution 
with n = 2 and probability of success 9. If the prior density of 9 is 

( k if i < 0 < 1 

m = l 

[ 0 otherwise, 

what is the Bayes’ estimate of 9 for a squared error loss if X = 1 ? 

Answer: Note that 9 is uniform on the interval (|, l), hence k = 2. There¬ 
fore, the prior density of 9 is 


h(9) = 



if \ < 9 < 1 
otherwise. 
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The population density is given by 


f(x/o) = Q e x (i - o) n ~ x = Q e x (1 - e) 2 ~ x , * = 0,1,2. 


The joint density of the sample and the parameter 9 is 


u(x,0) = h{9) f(x/9) 


= 2 ( 10 ® ( i-e) 2 ~ x 


where | < 9 < 1 and x = 0,1,2. The marginal density of the sample is given 


g(x) = / u(x, 9) d9. 


This integral is easy to evaluate if we substitute X = 1 now. Hence 


5(1) = f 2 (f) 9(1-9) d9 

I 1 \i/ 

2 x 7 

= J (4 9 - 4 9 2 ) d9 

\9 2 5 3 1 1 

~ 4 ~2~Y i 
2 

= \ [35 2 — 29 3 ] i 

O 2 

3 v ' \4 8, 


Therefore, the posterior density of 9 given x = 1, is 


k{9/x=l)= V ^ = l2{9-9\ 


where 1 < 9 < 1. Since the loss function is quadratic error, therefore the 
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Bayes’ estimate of 9 is 


9 = E[9/x = 1] 

= J 9k(6/x = l)d9 

= J 12 9(9-9 2 )d9 
= [40 3 -36» 4 ]i 


1 16 
_ 11 

“ 16 ' 


Hence, based on the sample of size one with X = 1, the Bayes’ estimate of 9 
is that is 


9 = 


11 

16 ' 


The following theorem help us to evaluate the Bayes estimate of a sample 
if the loss function is absolute error loss. This theorem is based the fact that 
a function </>(c) = E [ \X — c\ } is minimum if c is the median of X. 

Theorem 15.4. Let X- t . X'i , ...,X n be a random sample from a distribution 
with density f(x/9), where 9 is the unknown parameter to be estimated. If 
the loss function is absolute error, then the Bayes estimator 9 of the param¬ 
eter 9 is given by 

9 = median of k(9/xi,X 2 , ■■■,x n ) 

where k(9/xi,x 2 , ■■■,x n ) is the posterior distribution of 9. 

The followings are some examples to illustrate the above theorem. 

Example 15.26. Given 9, the random variable X has a binomial distribution 
with n = 3 and probability of success 9. If the prior density of 9 is 

( k if \ < 9 < 1 

m = \ 

y 0 otherwise, 

what is the Bayes’ estimate of 9 for an absolute difference error loss if the 
sample consists of one observation x = 3? 



Some Techniques for finding Point Estimators of Parameters 


444 


Answer: Since, the prior density of 9 is 


m = 



if \ < 9 < 1 
otherwise , 


and the population density is 

f(x/6)= Qftl-9) 3 - 1 , 

the joint density of the sample and the parameter is given by 


u(3 : 9)=h(9)f(3/9) = 29 3 , 


where f < 9 < 1. The marginal density of the sample (at x 


g( 3) = / u(3 : 6)d9 


= 2 9 3 d9 


' 0 4 1 1 

~2 i 
2 

15 

32' 


3) is given by 


Therefore, the conditional density of 9 given X = 3 is 

u{ 3, 9) 


k(9/x = 3) = 


64 Q 3 
15 w 


if \ < 8 < 1 


<?( 3) 


0 


elsewhere. 


Since, the loss function is absolute error, the Bayes’ estimator is the median 
of the probability density function k(9/x = 3). That is 


64 

15 


9 A d9 


64 

60 

64 

60 



1 

16 
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Solving the above equation for 0, we get 


6 = 



0.8537. 


Example 15.27. Suppose the prior distribution of 9 is uniform over the 
interval (2,5). Given 9, X is uniform over the interval (0,0). What is the 
Bayes’ estimator of 9 for absolute error loss if X = 1 ? 

Answer: Since, the prior density of 9 is 


m = 



if 2 < 9 < 5 
otherwise , 


and the population density is 

if 0 < a: < 0 

fix/9) = l 

[ 0 elsewhere, 

the joint density of the sample and the parameter is given by 
u(x,9) = h(6)f{x/6) = 

where 2 < 9 < 5 and 0 < x < 9. The marginal density of the sample (at 
x = 1) is given by 


3(1) = / w(l, 9) d9 


= / u(l,9) d9 + / u(l,9)d9 


5 1 
30'^ 



Therefore, the conditional density of 0 given the sample x = 1, is 


k{9/x = 1) 


u(l,0) 

5 ( 1 ) 


0 In (§) 
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Since, the loss function is absolute error, the Bayes estimate of 9 is the median 
of k(9/x = 1). Hence 

1 r° l 

- = / -tvv de 

2 k 0 In (I) 



Solving for 6, we get 

6 = \/l0 = 3.16. 


Example 15.28. What is the basic principle of Bayesian estimation? 

Answer: The basic principle behind the Bayesian estimation method con¬ 
sists of choosing a value of the parameter 9 for which the observed data have 
as high a posterior probability k{9/x\,x 2 ,--,x n ) of 9 as possible subject to 
a loss function. 

15.4. Review Exercises 


1. Let X-\. X‘i ,..., X n be a random sample of size n from a distribution with 
a probability density function 

( jg if -9 < x <9 
f(x; 9) = 1 

1 0 otherwise, 

where 0 < 9 is a parameter. Using the moment method find an estimator for 
the parameter 9. 


2. Let X \, X 2 ,..., X n be a random sample of size n from a distribution with 
a probability density function 

f (0 + 1) x~ e ~ 2 if 1 < x < oo 

/(+ 6 ) = < 

(0 otherwise, 

where 0 < 9 is a parameter. Using the moment method find an estimator for 
the parameter 9. 


3. Let Xi, X' 2 ,..., X n be a random sample of size n from a distribution with 
a probability density function 

(9 2 xe~ Sx if 0 < x < oo 
( 0 otherwise, 
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where 0 < 0 is a parameter. Using the moment method find an estimator for 
the parameter 9. 


4. Let X- t . X 2 , X n be a random sample of size n from a distribution with 
a probability density function 


f{x;0) 


9x e 1 if 0 < a: < 1 
0 otherwise, 


where 0 < 9 is a parameter. Using the maximum likelihood method find an 
estimator for the parameter 9. 


5. Let Xi, X- 2 ,..., X n be a random sample of size n from a distribution with 
a probability density function 

( (0 + 1) x~ e ~ 2 if 1 < x < oo 
/(+ 0 ) = < 

( 0 otherwise, 

where 0 < 9 is a parameter. Using the maximum likelihood method find an 
estimator for the parameter 9. 


6. Let Xi, X' 2 ,..., X n be a random sample of size n from a distribution with 
a probability density function 


/M) 


9 2 xe 6x if 0 < a: < oo 
0 otherwise, 


where 0 < 9 is a parameter. Using the maximum likelihood method find an 
estimator for the parameter 9. 


7. Let X \, X 2 , X 3 , X 4 be a random sample from a distribution with density 
function 


f(x; 0) = 


-(x-4) 
— P P 

D e 


for x > 4 


0 otherwise, 

where 0 > 0. If the data from this random sample are 8.2, 9.1, 10.6 and 4.9, 
respectively, what is the maximum likelihood estimate of (37 

8. Given 0, the random variable X has a binomial distribution with n = 2 
and probability of success 0. If the prior density of 0 is 


m = 


k if | < 6 < 1 


0 otherwise, 
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what is the Bayes’ estimate of 9 for a squared error loss if the sample consists 
of X\ = 1 and X 2 = 2. 

9. Suppose two observations were taken of a random variable X which yielded 
the values 2 and 3. The density function for X is 


/ 0 / 0 ) = 


3 if 0 < x < 


0 otherwise, 
and prior distribution for the parameter 9 is 

f3 9~ 4 if 6» > 1 

m = 

0 otherwise. 

If the loss function is quadratic, then what is the Bayes’ estimate for 91 

10 . The Pareto distribution is often used in study of incomes and has the 
cumulative density function 


F(x ; a, 9) = 


1 - 


0 


if a < x 
otherwise. 


where 0 < a < oo and 1 < 9 < oo are parameters. Find the maximum likeli¬ 
hood estimates of a and 9 based on a sample of size 5 for value 3, 5, 2, 7, 8. 

11. The Pareto distribution is often used in study of incomes and has the 
cumulative density function 


F(x; a, 9) = 


1- ? 


0 


if a < x 
otherwise. 


where 0 < a < oo and 1 < 9 < oo are parameters. Using moment methods 
find estimates of a and 9 based on a sample of size 5 for value 3, 5, 2, 7, 8. 

12. Suppose one observation was taken of a random variable X which yielded 
the value 2. The density function for X is 


/ (x/fj.) = 


1 


V 2 ^ 

and prior distribution of /z is 

Hf) = 




e -i(cc-Ai) 2 


e 2f 


— oo < x < oo, 


— oo < y, < oo. 
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If the loss function is quadratic, then what is the Bayes’ estimate for pi 

13. Let Xi, X 2 ,..., X n be a random sample of size n from a distribution with 
probability density 

( \ if 29 < x < 39 

/ 0 ) = | 

[ 0 otherwise, 

where 9 > 0. What is the maximum likelihood estimator of 91 

14. Let Xi, X 2 ,..., X n be a random sample of size n from a distribution with 
probability density 


f 1 — 9 2 if 0 < a: < 

/ 0 ) = < 

[ 0 otherwise, 

where 9 > 0. What is the maximum likelihood estimator of 91 

15. Given 9, the random variable X has a binomial distribution with n = 3 
and probability of success 9. If the prior density of 9 is 

( k if b < 9 < 1 
h{9) = \ 

\ 0 otherwise, 

what is the Bayes’ estimate of 9 for a absolute difference error loss if the 
sample consists of one observation x = 11 

16. Suppose the random variable X has the cumulative density function 
F(x). Show that the expected value of the random variable (X — c) 2 is 
minimum if c equals the expected value of X. 

17. Suppose the continuous random variable X has the cumulative density 
function F(x). Show that the expected value of the random variable \X — c\ 
is minimum if c equals the median of X (that is, F(c ) = 0.5). 

18. Eight independent trials are conducted of a given system with the follow¬ 
ing results: 5, F. S, F, S, S, S, S where S denotes the success and F denotes 
the failure. What is the maximum likelihood estimate of the probability of 
successful operation p 1 

19. What is the maximum likelihood estimate of /3 if the 5 values |, |, 1, 
|, | were drawn from the population for which f{x] (3) = \ (1 + /3) 5 (|) /3 ? 
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20. If a sample of five values of X is taken from the population for which 
f(x; t ) = 2(t — 1 )t x , what is the maximum likelihood estimator of t ? 

21. A sample of size n is drawn from a gamma distribution 


f(x;P) 


o _ 

x 3 e 0 
6 / 3 4 


if 0 < x < oo 


l 0 otherwise. 

What is the maximum likelihood estimator of j3 ? 

22. The probability density function of the random variable X is defined by 


f{x\X) 


1 — |A + X^/x if 0 < x < 1 
0 otherwise. 


What is the maximum likelihood estimate of the parameter A based on two 
independent observations X\ = \ and £2 = yg ? 

23. Let Xx,X 2 , ■■■,X n be a random sample from a distribution with density 
function f(x: a) = | e -<dx-A*|. What is the maximum likelihood estimator of 
a ? 


24. Suppose Xi, X 2 ,... are independent random variables, each with proba¬ 
bility of success p and probability of failure 1 — p, where 0 < p < 1. Let N 
be the number of observation needed to obtain the first success. What is the 
maximum likelihood estimator of p in term of N ? 

25. Let Xi, X- 2 - X 3 and X 4 be a random sample from the discrete distribution 
X such that 


P(X = x) = 



0 


for x = 0,1,2,..., 00 
otherwise, 


where 9 > 0. If the data are 17, 10, 32, 5, what is the maximum likelihood 
estimate of 9 ? 

26. Let X|, X 2 ,..., X n be a random sample of size n from a population with 
a probability density function 

f Jx^x a ~ 1 e~ Xx if 0 < x < 00 

/(xio,A>= 

[ 0 otherwise, 
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where a and A are parameters. Using the moment method find the estimators 
for the parameters a and A. 


27. Let Xi, A' 2 , X n be a random sample of size n from a population 
distribution with the probability density function 

f(x;p)= ^P X (1-P) 10 - X 

for x = 0,1,10, where p is a parameter. Find the Fisher information in 
the sample about the parameter p. 


28. Let Xi, A' 2 ,..., X n be a random sample of size n from a population 
distribution with the probability density function 


fix-, 9) 


6 2 xe 6x if 0 < 2 : < 00 
0 otherwise, 


where 0 < 6 is a parameter. Find the Fisher information in the sample about 
the parameter 9. 


29. Let Xi, X 2 ,..., X n be a random sample of size n from a population 
distribution with the probability density function 


fix; p, f 2 ) 



^ ln(x) — p. 


2 


0 


if 0 < x < 00 
otherwise , 


where — 00 < p < 00 and 0 < a 2 < 00 are unknown parameters. Find the 
Fisher information matrix in the sample about the parameters p and a 2 . 


30. Let X -\, X -2 -.... X n be a random sample of size n from a population 
distribution with the probability density function 


fix; p, A) 



if 0 < x < 00 


0 


otherwise , 


where 0 < p < 00 and 0 < A < 00 are unknown parameters. Find the Fisher 
information matrix in the sample about the parameters p and A. 


31. Let X\ , X '2 ,..., X n be a random sample of size n from a distribution with 
a probability density function 

\ ao, x a ~ x e~t if 0 < x < 00 

1 (a) u a 


fix) = 


0 


otherwise, 
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where a > 0 and 9 > 0 are parameters. Using the moment method find 
estimators for parameters a and /3. 

32. Let Xx,X 2 , ..., X n be a random sample of sizen from a distribution with 
a probability density function 

/(x:l,) = iTi + (l-9)T -“<*<“■ 


where 0 < 9 is a parameter. Using the maximum likelihood method find an 
estimator for the parameter 9. 

33. Let Xi,X 2 , ■■■,X n be a random sample of sizen from a distribution with 
a probability density function 

f(x ; 9) = ^ , —oo < x < oo, 

where 0 < 9 is a parameter. Using the maximum likelihood method find an 
estimator for the parameter 9. 

34. Let Xi, X 2 ,..., X n be a random sample of size n from a population 
distribution with the probability density function 


f{x; A) 


A^ e~ A 
x\ 


0 


if x = 0,1,..., oo 
otherwise, 


where A > 0 is an unknown parameter. Find the Fisher information matrix 
in the sample about the parameter A. 

35. Let Xi, X 2 ,..., X n be a random sample of size n from a population 
distribution with the probability density function 

f (1 — p) x ~ 1 p if x = 1,..., oo 
f(x\p) = < 

[ 0 otherwise, 


where 0 < p < 1 is an unknown parameter. Find the Fisher information 
matrix in the sample about the parameter p. 

36. Let Xi,X 2 , ...,X n be a random sample from a population X having the 
probability density function 

ff x . = / w 9 ~ x > if 0 <x<9 

\ 0 otherwise, 
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where 9 > 0 is a parameter. Find an estimator for 9 using the moment 
method. 

37. A box contains 50 red and blue balls out of which 9 are red. A sample 
of 30 balls is to be selected without replacement. If X denotes the number 
of red balls in the sample, then find an estimator for 9 using the moment 
method. 
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Chapter 16 

CRITERIA 

FOR 

EVALUATING 
THE GOODNESS OF 
ESTIMATORS 


We have seen in Chapter 15 that, in general, different parameter estima¬ 
tion methods yield different estimators. For example, if X ~ UNIF( 0, 9) and 
X -[, X- 2 ,..., X n is a random sample from the population X, then the estimator 
of 9 obtained by moment method is 

9mm = 2 A 

where as the estimator obtained by the maximum likelihood method is 

9 ml = X^ n ) 

where X and X( n ) are the sample average and the ?r th order statistic, respec¬ 
tively. Now the question arises: which of the two estimators is better? Thus, 
we need some criteria to evaluate the goodness of an estimator. Some well 
known criteria for evaluating the goodness of an estimator are: (1) Unbiased¬ 
ness, (2) Efficiency and Relative Efficiency, (3) Uniform Minimum Variance 
Unbiasedness, (4) Sufficiency, and (5) Consistency. 

In this chapter, we shall examine only the first four criteria in details. 
The concepts of unbiasedness, efficiency and sufficiency were introduced by 
Sir Ronald Fisher. 
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16.1. The Unbiased Estimator 

Let Xi, Xi , X n be a random sample of size n from a population with 
probability density function f(x: 0). An estimator 9 of 9 is a function of 
the random variables Xi, X 2 ,..., X n which is free of the parameter 9. An 
estimate is a realized value of an estimator that is obtained when a sample 
is actually taken. 

Definition 16.1. An estimator 9 of 9 is said to be an unbiased estimator of 
9 if and only if 

E (o) = 9. 


If 9 is not unbiased, then it is called a biased estimator of 9. 

An estimator of a parameter may not equal to the actual value of the pa¬ 
rameter for every realization of the sample X \, X%, ...,X n , but if it is unbiased 
then on an average it will equal to the parameter. 

Example 16.1. Let X\, X 2 ,..., X n be a random sample from a normal 
population with mean /./, and variance cr 2 > 0. Is the sample mean X an 
unbiased estimator of the parameter /i ? 

Answer: Since, each JQ ~ cr 2 ), we have 



That is, the sample mean is normal with mean /./, and variance —. Thus 


E (X) = p. 


Therefore, the sample mean X is an unbiased estimator of fi. 

Example 16.2. Let X\, X 2 ,..., X n be a random sample from a normal pop¬ 
ulation with mean /j and variance cr 2 > 0. What is the maximum likelihood 
estimator of a 2 ? Is this maximum likelihood estimator an unbiased estimator 
of the parameter a 2 ? 


Answer: In Example 15.13, we have shown that the maximum likelihood 
estimator of cr 2 is 



1 

n 


Y.(x,-x)\ 
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Now, we examine the unbiasedness of this estimator 


E 


= E 


= E 




i —1 


11 n 

"- 1 1 , E< A '- - A ) : 


n n — 1 


■E 


n — 1 


E [S' 2 ] 
n—1 


n — 1 
n 

n — 1 
n 

n 

= —E [x 2 (n — 1)] 
n 

n — 1 , 




s 1 




71 — 1 

(since —S 2 ~ x 2 (n — 1)) 


a 


n 




Therefore, the maximum likelihood estimator of a 2 is a biased estimator. 

Next, in the following example, we show that the sample variance S 2 
given by the expression 


S 2 


1 


n — 1 


E (*-*) 2 


is an unbiased estimator of the population variance <r 2 irrespective of the 
population distribution. 

Example 16.3. Let Xi, X 2 , ■■■, X n be a random sample from a population 
with mean /j, and variance er 2 > 0. Is the sample variance S 2 an unbiased 
estimator of the population variance cr 2 ? 

Answer: Note that the distribution of the population is not given. However, 
we are given E(Xj) = /j, and E[(Xi — /z) 2 ] = cr 2 . In order to find E (S 2 ), 
we need E (X) and E . Thus we proceed to find these two expected 
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values. Consider 


E (X) = E 


^ x i + X 2 + ■ ■ ■ + X n ^ 


1 

n 


i =1 


1 

n 


= M 

i=l 


Similarly, 


Therefore 


Consider 


Var (X) = Var 


X ± + X 2 + • • • + X n 


2 = 1 2=1 


E[x 2 ) = Var(X)+E(X) 2 =°^+^. 


E (S' 2 ) = E 


1 


n — 1 

1 


n — 1 

1 

n — 1 

1 

71—1 

1 

n — 1 


2=1 
n 

53 (X 2 - 2lI > + ^ 
, 2=1 
n 

J2 x ?-nX‘ 


■E 


E 


,i =1 


EE* 2 ]-^ 


k 2=1 


77 X 


n((j2 + // 2 ) — 7i ( /x 2 + 

[(« - 1) - 2 ] 


= (7 2 . 


Therefore, the sample variance S 2 is an unbiased estimator of the population 
variance ct 2 . 

Example 16.4. Let X be a random variable with mean 2. Let 6\ and 
d 2 be unbiased estimators of the second and third moments, respectively, of 
X about the origin. Find an unbiased estimator of the third moment of X 
about its mean in terms of 9± and 0 2 . 
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Answer: Since, 9\ and 62 are the unbiased estimators of the second and 
third moments of A about origin, we get 

E(§i\ = E(X 2 ) and E ff 2 ) = E (A 3 ). 

The unbiased estimator of the third moment of X about its mean is 
e\(X- 2) 3 j = E [A 3 - 6 A 2 + 12A - 8 ] 

= E [A 3 ] - 6 E [A 2 ] + 12 E [A] - 8 
= 62 ~ 60i + 24 - 8 
= 0 2 - 6 fli + 16. 

Thus, the unbiased estimator of the third moment of A about its mean is 
0 2 - 69i + 16. 

Example 16.5. Let Ai,A 2 , ..., A 5 be a sample of size 5 from the uniform 
distribution on the interval (0,0), where 0 is unknown. Let the estimator of 
0 be k A max , where k is some constant and A max is the largest observation. 
In order k A max to be an unbiased estimator, what should be the value of 
the constant k ? 


Answer: The probability density function of A max is given by 

ff(®) = [-F (®)] 4 f(x) 



If k A max is an unbiased estimator of 0, then 


e = E(k A max ) 
= k E (A max ) 


= k 

= k 





Hence, 
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Example 16.6. Let X±, X 2 ,..., X n be a sample of size n from a distribution 

with unknown mean —00 < /z < 00 , and unknown variance er 2 > 0. Show 

that the statistic X and Y = A ' 1+2 if(J^i) +rl ' Y " are both unbiased estimators 

_ 2 

of /z. Further, show that Var (X) < Var{Y). 

Answer: First, we show that X is an unbiased estimator of /j 


E(X)=E 


^ Xi + X 2 + ■ ■ ■ + X n ^ 


1 

n 


2=1 


1 

n 


i=l 


Hence, the sample mean X is an unbiased estimator of the population mean 
irrespective of the distribution of X. Next, we show that Y is also an unbiased 
estimator of /z. 


E(Y)=E 


X[ 2A2 + • • • + nX„ 

n (n+1) 

, 2 


2 

n (n +1 ) 


i=1 


2 

n (n +1 ) 
2 

n (n +1 ) 


j=i 

n (n +1) 
M 2 


= M- 


Hence, A and F are both unbiased estimator of the population mean irre¬ 
spective of the distribution of the population. The variance of X is given 
by 

rx 1 + x 2 + . 


Var X =Var 


■X„ 


= 4 Var [Xi + X 2 + ■ ■ ■ + X n ] 
n z 

n 

= ^ 1 Z Var ^ Xi ] 

2—1 


n 
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Similarly, the variance of Y can be calculated as follows: 


Var [Y] = Var 


n 2 (n + !)• 


n 2 (n + l) 2 


Var [lX 1 + 2X 2 + --- + nX n ] 


]TVar\iX,\ 


— ^i'VarlX,] 


n 2 (n +1) 2 


2 4 n (n + 1) (2n + 1) 

° n 2 (n + l) 2 6 

2 2n + 1 a 2 

3 (n + 1) n 


2 2n+ 1 

3 (n+1) 


Far [X' . 


Since | > 1 for n > 2, we see that Var [X] < Var[Y], This shows 

that although the estimators X and Y are both unbiased estimator of /j, yet 
the variance of the sample mean X is smaller than the variance of Y. 

In statistics, between two unbiased estimators one prefers the estimator 
which has the minimum variance. This leads to our next topic. However, 
before we move to the next topic we complete this section with some known 
disadvantages with the notion of unbiasedness. The first disadvantage is that 
an unbiased estimator for a parameter may not exist. The second disadvan¬ 
tage is that the property of unbiasedness is not invariant under functional 
transformation, that is, if 9 is an unbiased estimator of 9 and g is a function, 
then g(9) may not be an unbiased estimator of g(9). 

16.2. The Relatively Efficient Estimator 

We have seen that in Example 16.6 that the sample mean 

X 1 + x 2 + --- + x n 

n 

and the statistic 


1 + 2 + 
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are both unbiased estimators of the population mean. However, we also seen 
that 

Var(X ) < Var(Y). 

The following figure graphically illustrates the shape of the distributions of 
both the unbiased estimators. 



Distributions of Y 
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If an unbiased estimator has a smaller variance or dispersion, then it has 
a greater chance of being close to true parameter 0. Therefore when two 
estimators of 0 are both unbiased, then one should pick the one with the 
smaller variance. 

Definition 16.2. Let 9\ and 02 be two unbiased estimators of 9. The 
estimator 9\ is said to be more efficient than 02 if 

Var ) < Var (o 2 ) ■ 

The ratio rj given by 



is called the relative efficiency of 0i with respect to 02 - 

Example 16.7. Let Xi, X 2 , X$ be a random sample of size 3 from a pop¬ 
ulation with mean /i and variance a 2 > 0. If the statistics X and Y given 
by 

Y _X 1 + 2X 2 + 3X 3 
6 


are two unbiased estimators of the population mean /j, then which one is 
more efficient? 
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Answer: Since E (A*) = fi and Var (A,) = a 2 , we get 


E (A) = E ^ 


A x + A 2 + A 3 


= - (£(A 1 ) + £(A 2 ) + £(A 3 )) 


E(Y) = E 


A! + 2A 2 + 3A 3 


- - (E (A,) + 2E (A 2 ) + 3E (A 3 )) 

= l 6fi 


Therefore both A and Y are unbiased. Next we determine the variance of 
both the estimators. The variances of these estimators are given by 


Var (A) = Var ^ 


A, + A 2 + A 3 


= - [Var (A,) + Var (A 2 ) + Var (A 3 )] 

y 

= 1 3tJ 2 
9 


36 a 


Var (Y) = Var 


A-| + 2A 2 + 3A 3 


= — [Var (AO + 4Yar (A 2 ) + 9Yar (A 3 )] 

= - 14a 2 
36 


36 ° ' 


1? a 2 = Var (A) < Var (Y) = P a 2 . 


Therefore 
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Hence, X is more efficient than the estimator Y. Further, the relative effi¬ 
ciency of X with respect to Y is given by 


V{X,Y) 


14 _ 7 
12 ~ 6 


Example 16.8. Let Xi, X 2 , ■■■, X n be a random sample of size n from a 
population with density 


f(x; 0) 


\e » if 0 < 2 : < 00 
0 otherwise, 


where 9 > 0 is a parameter. Are the estimators X\ and X unbiased? Given, 
X\ and X , which one is more efficient estimator of 9 ? 

Answer: Since the population X is exponential with parameter 0, that is 
X ~ EXP(9), the mean and variance of it are given by 


E(X) = 8 


and 


Var(X) = 8 2 . 


Since Xi, X 2 ,X n is a random sample from X, we see that the statistic 
X\ ~ EXP(9). Hence, the expected value of X- t is 8 and thus it is an 
unbiased estimator of the parameter 9. Also, the sample mean is an unbiased 
estimator of 9 since 

1 n 

e ( x ) = -T, e ^) 

i—1 

= — nd 
n 

= 8 . 

Next, we compute the variances of the unbiased estimators X 1 and X. It is 
easy to see that 

Var (AA) = 8 2 


and 


Var (X) = Var 


x 1 + x 2 + --- + x„ 


n 


= ^E l '« r W) 


i =1 


= -2 n9 
n z 

n ’ 
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Hence 


r\2 

— = Var (X) < Var (AG = 0 2 . 
n v ' 


Thus X is more efficient than X\ and the relative efficiency of X with respect 
to Xi is 


_ q2 

v(X, Xi) = w = n. 


n 


Example 16.9. Let X-\ , X 2 , be a random sample of size 3 from a popu¬ 
lation with density 


f(x; A) 


A* e~ A 
x\ 


if x = 0,1, 2,..., oo 


0 otherwise, 


where A is a parameter. Are the estimators given by 


Ai = - (Xi + 2X 2 + X 3 ) 


and A 2 = - (4A-, + 3A 2 + 2A 3 ) 
9 


unbiased? Given, Ai and A 2 , which one is more efficient estimator of A ? 
Find an unbiased estimator of A whose variance is smaller than the variances 
of Ai and A 2 . 

Answer: Since each X, t ~ POI( A), we get 


E(Xi) = A and Var(X. l ) = A. 
It is easy to see that 

E (XT) = J (E (A,) + 2 E (X 2 ) + E (A 3 )) 


= A, 


and 


E f A 2 ) = ^ (4 E (Ar) + 3 E (A 2 ) + 2 E (A 3 )) 

V / 9 
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Thus, both Ai and A 2 are unbiased estimators of A. Now we compute their 
variances to find out which one is more efficient. It is easy to note that 


Var 


(^i) 


= — (Var (X,) + War (X 2 ) + Var (X 3 )) 
16 

= s 6A 

486 
~ 1296 


A, 


and 


Since, 


Var 


(A2) 


= — (16Var (Vi) + War (X 2 ) + War (X 3 )) 
81 

= xr 29A 

- 29 a 

~8l A 

464 

A, 


1296 

Var p) < Var (XT) 


the estimator A 2 is efficient than the estimator Ai. We have seen in section 
16.1 that the sample mean is always an unbiased estimator of the population 
mean irrespective of the population distribution. The variance of the sample 
mean is always equals to A times the population variance, where n denotes 
the sample size. Hence, we get 


Therefore, we get 


Var (X) = ~ 


432 A 
1296 A ' 


Var (X) < Var (a 2 ) < Var (a4 . 


Thus, the sample mean has even smaller variance than the two unbiased 
estimators given in this example. 

In view of this example, now we have encountered a new problem. That 
is how to find an unbiased estimator which has the smallest variance among 
all unbiased estimators of a given parameter. We resolve this issue in the 
next section. 
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16.3. The Uniform Minimum Variance Unbiased Estimator 

Let X\. X 2 , ..., X n be a random sample of size n from a population with 
probability density function f{x\9). Recall that an estimator 9 of 9 is a 
function of the random variables Xi,X 2 ,..., X n which does depend on 9. 

Definition 16.3. An unbiased estimator 9 of 9 is said to be a uniform 
minimum variance unbiased estimator of 9 if and only if 

Var < Var (f) 
for any unbiased estimator T of 9. 

If an estimator 9 is unbiased then the mean of this estimator is equal to 
the parameter 9, that is 

e(9\=9 

and the variance of 9 is 



This variance, if exists, is a function of the unbiased estimator 9 and it has a 
minimum in the class of all unbiased estimators of 9. Therefore we have an 
alternative definition of the uniform minimum variance unbiased estimator. 

Definition 16.4. An unbiased estimator 9 of 9 is said to be a uniform 
minimum variance unbiased estimator of 9 if it minimizes the variance 
E (9-9) 2 . 

Example 16.10. Let 9\ and 9 2 be unbiased estimators of 9. Suppose 
Var = 1, Var = 2 and Cov 9 2 ^j = What are the val¬ 
ues of Ci and C 2 for which C\9\ + c 2 9 2 is an unbiased estimator of 9 with 
minimum variance among unbiased estimators of this type? 

Answer: We want C\Q\ +c 2 9 2 to be a minimum variance unbiased estimator 
of 9. Then 

E Ci$i + c 2 9 2 = 9 
=> Ci E 9\ + C 2 E 9 2 =9 
=> Ci 9 + c 2 9 = 9 

=> Ci + C 2 = 1 
=> C 2 = 1 — Cl. 
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Therefore 


Var 


Ci + C202 


= ci Var 


0i 


Cn Var 


+ 2 ci C 2 Cov 


(£, Oi) 


— Ci 2 cl T C1C2 
= Ci + 2(1 — ci) 2 + ci(l — Ci) 

= 2(1 — Ci) 2 + ci 


— 2 + 2c 2 — 3ci. 


Hence, the variance Var 
function by that is 


Cl 01 + C202 


is a function of ci. Let us denote this 


■— Var 


Cl 01 + C 2 0 2 


— 2 -f- 2c x — 3ci. 


Taking the derivative of <j>{c\) with respect to ci, we get 


—4>(ci) = 4ci - 3. 

aci 

Setting this derivative to zero and solving for ci, we obtain 

3 


4ci - 3 = 0 


=> Ci = 


Therefore 


1 1 3 1 

C2 = l-ci = l-- = -. 


In Example 16.10, we saw that if 0i and 0 2 are any two unbiased esti¬ 
mators of 0, then c0i + (1 — c) 02 is also an unbiased estimator of 0 for any 
c Hence given two estimators 0i and 02, 

C= {0 I 0 = c0i + (i-c)0 2 , cel,} 

forms an uncountable class of unbiased estimators of 0. When the variances 
of 0i and 02 are known along with the their covariance, then in Example 
16.10 we were able to determine the minimum variance unbiased estimator 
in the class C. If the variances of the estimators 0i and 0 2 are not known, 
then it is very difficult to find the minimum variance estimator even in the 
class of estimators C. Notice that C is a subset of the class of all unbiased 
estimators and finding a minimum variance unbiased estimator in this class 
is a difficult task. 
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One way to find a uniform minimum variance unbiased estimator for a 
parameter is to use the Cramer-Rao lower bound or the Fisher information 
inequality. 

Theorem 16.1. Let X\, X 2 ,..., X n be a random sample of size n from a 
population X with probability density f(x\ 9 ), where 9 is a scalar parameter. 
Let 9 be any unbiased estimator of 9. Suppose the likelihood function L(6 ) 
is a differentiable function of 9 and satisfies 


d_ 

d9 


/ OO POO 

■ h(x 1 ,...,x n )L(9)dxi---dx n 

-OO •/ —00 

/ OO -OO ^ 

■ J h(x 1 ,..., x n ) —L(60 dx i ■■■dx n 


(1) 


for any h(x 1 ,... ,x n ) with E(h(Xi, ...,X n )) < 00 . Then 



(CR1) 


Proof: Since L(9) is the joint probability density function of the sample 
Xi,X2 ,..., X n , 

/oo poo 

■ ■ ■ L(9) dx 1 • • • dx n = 1. (2) 

-OO J — OO 

Differentiating (2) with respect to 9 we have 


_d 


/ OO /»oo 

• • • / L{9) dx 1 • • • = 0 

-00 J —00 

and use of (1) with h(x 1 , ...,x n ) = 1 yields 

COO COO 7 

J •" j dxi "' dXn = °' 


Rewriting (3) as 


/ OO /»OO 

-00 J — 


dL{9) 1 


we see that 


/>oo /*oo 


d9 L{9) 
d\nL(0) 


L{9) dx 1 • • • da: n = 0 


' — 00 —OO 


d9 


L(9) dx 1 • • • = 0. 


(3) 
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Hence 


' —oo J —oo 


dlnL(0) 

d0 


L(0) dx i • • • dx n = 0. 


Since 9 is an unbiased estimator of 9, we see that 


E 9 = 


9 L{9) dx i • • • dx n = 9. 


' —oo J — OO 


Differentiating (5) with respect to 9 , we have 

pOO 

■ ■■ 9 L(9) dxi ■ ■ ■ dx n = 1. 


d 

d9 


' —oo J — OO 


Again using (1) with h{X\, ...,X n ) = 9, we have 


/ OO pOO 

-oo J—oo 


9 — L{6) dx i • • • d:r„ = 1. 
dO 


Rewriting (6) as 


r _ 1 _ da . i ... dXn = 1 


> —oo J —OO 


we have 


9 


d9 L{9) 
din L(0) 


' —oo «/ —OO 


d0 


L(9) dx i • • • da: n = 1. 


From (4) and (7), we obtain 


' —oo «/ —OO 


(M 


din L{9) 


d9 


L{9) dx i • • • dx„ = 1. 


By the Cauchy-Schwarz inequality, 


i= (£"£oh 

< -J (9 - 9y L(9) d Xl ■ ■ ■ dx n ^j 

. 


(£) £■ 

V 9 In L(0) \ 2 ’ 

V / 

V 99 J 


(4) 

(5) 


( 6 ) 


(7) 

( 8 ) 
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Therefore 


Var 9) > 


E 


' din L(9) V 
09 


and the proof of theorem is now complete. 

If L(9) is twice differentiable with respect to 9, the inequality (CR1) can 
be stated equivalently as 


Var 


0 


> 


-1 


E 


0 2 In L(9) 
OB 2 


(CR2) 


The inequalities (CR1) and (CR2) are known as Cramer-Rao lower bound 
for the variance of 9 or the Fisher information inequality. The condition 
(1) interchanges the order on integration and differentiation. Therefore any 
distribution whose range depend on the value of the parameter is not covered 
by this theorem. Hence distribution like the uniform distribution may not 
be analyzed using the Cramer-Rao lower bound. 

If the estimator 9 is minimum variance in addition to being unbiased, 
then equality holds. We state this as a theorem without giving a proof. 

Theorem 16.2. Let X\, X %,..., X n be a random sample of size n from a 
population X with probability density f(x\ 9), where 9 is a parameter. If 9 
is an unbiased estimator of 9 and 



then 9 is a uniform minimum variance unbiased estimator of 9. The converse 
of this is not true. 

Definition 16.5. An unbiased estimator 9 is called an efficient estimator if 
it satisfies Cramer-Rao lower bound, that is 


Var (e) 



In view of the above theorem it is easy to note that an efficient estimator 
of a parameter is always a uniform minimum variance unbiased estimator of 
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a parameter. However, not every uniform minimum variance unbiased esti¬ 
mator of a parameter is efficient. In other words not every uniform minimum 
variance unbiased estimators of a parameter satisfy the Cramer-Rao lower 
bound 


Var ( 0 ) 



Example 16.11. Let X\, X 2 ,..., X n be a random sample of size n from a 
distribution with density function 


f{x; 0) 


3 8 x 2 e 9x3 if 0 < x < 00 
0 otherwise. 


What is the Cramer-Rao lower bound for the variance of unbiased estimator 
of the parameter 9 ? 

Answer: Let 9 be an unbiased estimator of 9. Cramer-Rao lower bound for 
the variance of 6 is given by 

Var ( 0 ) > — 
v ' E 


-1 


d 2 In L(6) 

ae 2 


where L(9) denotes the likelihood function of the given random sample 
X \, X- 2 , ...,X n . Since, the likelihood function of the sample is 

n 

L(9) = Y[36x 2 e- 9x ? 

i =1 


we get 


In L(9) = n In 8 + In (3x 2 ) — 8 x\. 

i=l »= 1 


din L(9) 
08 


n 



i=1 


and 

3 2 ln L{9) n 

W 2 = _ 8 2 ' 


Hence, using this in the Cramer-Rao inequality, we get 
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Thus the Cramer-Rao lower bound for the variance of the unbiased estimator 
of 9 is —. 

n 

Example 16.12. Let Xi, X 2 ,X n be a random sample from a normal 
population with unknown mean /z and known variance a 2 > 0. What is the 
maximum likelihood estimator of /z? Is this maximum likelihood estimator 
an efficient estimator of /z? 

Answer: The probability density function of the population is 


f(x; fj) 


V2- 




7r a* 


Thus 

In f(x; /z) = - ^ ln(27ru 2 ) - T ( x _ M ) 2 

and hence 

In=-|ln(27rcr 2 ) - ^ -/z) 2 . 

i=l 

Taking the derivative of In L(/z) with respect to /z, we get 


din L(/z) 
d/x 


n 

i=l 


Setting this derivative to zero and solving for /z, we see that /z = A". 
The variance of A is given by 


Ear (A) = Ear 
_2 


A-| 


Xn 


■ A„ 


n 


<7 

n 


Next we determine the Cramer-Rao lower bound for the estimator A. 
We already know that 


and hence 


Therefore 


dlnL(fx) 

d/z 


n 

Y,( Xi - v) 

i=i 


d 2 In L(/z) n 

d^i 2 a 2 ' 

( d 2 lni(/z)\ n 

\ dfj, 2 ) a 2 
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and 


Thus 


Var (X) = - 


a 

n 


and X is an efficient estimator of /r. Since every efficient estimator is a 
uniform minimum variance unbiased estimator, therefore X is a uniform 
minimum variance unbiased estimator of /i. 

Example 16.13. Let Xi, X 2 ,X n be a random sample from a normal 
population with known mean /i and unknown variance <r 2 > 0. What is the 
maximum likelihood estimator of a 2 ? Is this maximum likelihood estimator 
a uniform minimum variance unbiased estimator of a 2 ? 

Answer: Let us write 9 = a 2 . Then 

1 


fix-, 9) = e i** - ”"* 

K y/2^9 


and 


In L{9) = ~ ln(2 7 r) - ^ In (9) - d- ^( Xi - /.i ) 2 . 


i =1 


Differentiating lnL(f?) with respect to 9, we have 

1 

d9 2 9 ' 202 


In £(0) = -~ + oLl>i-/4 2 


Setting this derivative to zero and solving for 9 , we see that 

1 ” 

» = ; B* - rt 2 - 


Next we show that this estimator is unbiased. For this we consider 


e 0 = - e U£ ( x '^'‘ ) 




Xi - 

a 


= -E{ X \n)) 
n 

= —n = 9. 
n 
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Hence 9 is an unbiased estimator of 9. The variance of 9 can be obtained as 
follows: 



= %Var(x 2 (n)) 

9 2 n 
= ^ 4 2 
_ 2 9 2 _ 2o' 4 
n n 


Finally we determine the Cramer-Rao lower bound for the variance of 9. The 
second derivative of In L(9) with respect to 9 is 


d 2 In L(9) 
dff 2 


n 

Iff 2 




Hence 


Thus 


Therefore 


/ d 2 In L(9) \ 

V M 2 ) 


20 2 !>) ) 

n n 

W 2 ~ ff 2 
n 

~202 


1 2 9 2 2a 4 


/ d 2 \nL(9)\ n n 

\ ffl 2 ) 


Var 



1 


E 


( d, 2 ln£(0)\ ‘ 
\ dO 2 / 


Hence 9 is an efficient estimator of 9. Since every efficient estimator is a 
uniform minimum variance unbiased estimator, therefore ^ — ff) 2 

is a uniform minimum variance unbiased estimator of a 2 . 


Example 16.14. Let X\, X 2 ,..., X n be a random sample of size n from a 
normal population known mean fi and variance cr 2 > 0. Show that S 2 = 
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n ^ X^r=i (Xi — V) 2 is an unbiased estimator of a 2 . Further, show that S 2 
can not attain the Cramer-Rao lower bound. 

Answer: From Example 16.2, we know that S 2 is an unbiased estimator of 
a 2 . The variance of S 2 can be computed as follows: 


Var ( S 2 ) = Var ( -—- ^(X; - X) 




7 4 _ 

(n- l ) 2 
u 4 

(n — l ) 2 
u 4 

(n- l ) 2 
2cr 4 


vi=l 

, 2 , 


Var( \ (n- 1 )) 


2 (n - 1) 


n — 1 


Next we let 9 = a 2 and determine the Cramer-Rao lower bound for the 
variance of S 2 . The second derivative of lnL(0) with respect to 9 is 


d 2 In L(9) 
d9 2 


i=l 


Hence 


Thus 


Hence 


{ d 2 In L(9)\ 

V d(P ) 


292 9sE ^S Xi ^ ) 

n n 
202 ^ 02 
n 

~202 


1 0 2 2(7 4 


( d. 2 in L(8) \ n n 

V de2 ) 


2 o' 4 
71—1 


Ear (S' 2 ) > 


1 


( d 2 In L(0)\ 


2cr 4 

n 


This shows that S' 2 can not attain the Cramer-Rao lower bound. 
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The disadvantages of Cramer-Rao lower bound approach are the fol¬ 
lowings: (1) Not every density function f(x: 9) satisfies the assumptions 
of Cramer-Rao theorem and (2) not every allowable estimator attains the 
Cramer-Rao lower bound. Hence in any one of these situations, one does 
not know whether an estimator is a uniform minimum variance unbiased 
estimator or not. 

16.4. Sufficient Estimator 

In many situations, we can not easily find the distribution of the es¬ 
timator 9 of a parameter 9 even though we know the distribution of the 
population. Therefore, we have no way to know whether our estimator 9 is 
unbiased or biased. Hence, we need some other criteria to judge the quality 
of an estimator. Sufficiency is one such criteria for judging the quality of an 
estimator. 

Recall that an estimator of a population parameter is a function of the 
sample values that does not contain the parameter. An estimator summarizes 
the information found in the sample about the parameter. If an estimator 
summarizes just as much information about the parameter being estimated 
as the sample does, then the estimator is called a sufficient estimator. 

Definition 16.6. Let X ~ f(x\9) be a population and let Xi, X2, ..., X n 
be a random sample of size n from this population X. An estimator 9 of 
the parameter 9 is said to be a sufficient estimator of 9 if the conditional 
distribution of the sample given the estimator 9 does not depend on the 
parameter 9. 

Example 16.15. If Xi,X 2 , ..., A„ is a random sample from the distribution 
with probability density function 

( 9 X (1 - 9) 1 ~ x if a; = 0,1 
f{x\ 9) = l 

y 0 elsewhere , 

where 0 < 9 < 1 . Show that Y = J27= 1 is a sufficient statistic of 9. 
Answer: First, we find the distribution of the sample. This is given by 

n 

f( xi ,x 2 ,...,x n ) = - 9 j 1 -** = 9y( 1 - 9) n -y. 

i =1 

Since, each Xi ~ BER(6 ), we have 

n 

Y = J2 X i ~ BIN(n, 9). 

i=1 
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n 

If X\ = X \, X 2 = x 2 ,..., X n = x n and Y = EEj) then 

i=l 


f(x i,x 2 ,-,x n ,y) 


f(x 1 ,x 2 ,...,x n ) if V = 


Therefore, the probability density function of Y is given by 

g(u) = (") oy (i - e) n ~y. 


Now, we find the conditional density of the sample given the estimator 
Y, that is 


f(xi,X 2 , • ••, x n /Y = y) 


f(xl,x 2 , .~,Xn,y) 

a(y) 

f(X l,X 2 ,.:,X n ) 

g(y ) 

e y { i - 6 ) n ~ y 
(") Qy (l - o) n ~y 
l 


Hence, the conditional density of the sample given the statistic Y is indepen¬ 
dent of the parameter 6. Therefore, by definition Y is a sufficient statistic. 


Example 16.16. If Xi, X 2 , ...,X n is a random sample from the distribution 
with probability density function 


f(x; 0) 


e ( x e ) if 9 < x < oo 
0 elsewhere , 


where — oo < 6 < oo. What is the maximum likelihood estimator of 8 1 Is 
this maximum likelihood estimator sufficient estimator of 6 ? 

Answer: We have seen in Chapter 15 that the maximum likelihood estimator 
of 9 is Y = Xm, that is the first order statistic of the sample. Let us find 
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the probability density of this statistic, which is given by 

9(y) = [F(y)}° M [i - 1 

= nf(y ) [1 -Fiy)}" 1 ” 1 
= ne- {y - e) 1-{l-e- (y - e) }]” _1 
= ne ne e~ ny . 

The probability density of the random sample is 

n 

f(x 1 ,X 2 , x n ) = e -( x i- 0 ) 

2—1 

nO „—nx 

= e e , 

n 

where nx = Let A be the event (X- t = X\,X 2 = x 2 , —,X n = x n ) and 

2=1 

B denotes the event (Y = y). Then A C B and therefore Af] B = A. Now, 
we find the conditional density of the sample given the estimator Y, that is 

f(x 1 ,x 2 ,.,.,x n /Y = y) = P(X i = X!,X 2 = x 2 ,...,X n = x n /Y = y) 

= P(A/B) 

P(Af]B) 

P{B) 

P{A) 

P{B) 

= f(xi,x 2 , x n ) 

y(y) 

^nO g —nx 

~ ne n9 e~ ny 

e -nx 

- ne~ ny ' 

Hence, the conditional density of the sample given the statistic Y is indepen¬ 
dent of the parameter 9. Therefore, by definition Y is a sufficient statistic. 

We have seen that to verify whether an estimator is sufficient or not one 
has to examine the conditional density of the sample given the estimator. To 
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compute this conditional density one has to use the density of the estimator. 
The density of the estimator is not always easy to find. Therefore, verifying 
the sufficiency of an estimator using this definition is not always easy. The 
following factorization theorem of Fisher and Neyman helps to decide when 
an estimator is sufficient. 


Theorem 16.3. Let Xi, X 2 ,X n denote a random sample with proba¬ 
bility density function f(x 1 , X 2 , ■■■, x n ; 0), which depends on the population 
parameter 9. The estimator 9 is sufficient for 9 if and only if 

f(xi,X 2 ,..., x n \9) = <j>{9,9) h(xi,X 2 , ■■■,x„) 


where <f> depends on Xi, X 2 , ..., x n only through 9 and h(x 1 , x 2 ,x n ) does 
not depend on 9. 

Now we give two examples to illustrate the factorization theorem. 


Example 16.17. Let Xi, X 2 ,..., X n be a random sample from a distribution 
with density function 


f{x; A) 


A*e~ A 

x\ 


if x = 0,1, 2,..., 00 


0 elsewhere, 


where A > 0 is a parameter. Find the maximum likelihood estimator of A and 
show that the maximum likelihood estimator of A is a sufficient estimator of 
the parameter A. 


Answer: First, we find the density of the sample or the likelihood function 
of the sample. The likelihood function of the sample is given by 


L(X) = Ylf(xf,X) 

i =1 


=n 


\ Xi e~ x 
X{\ 


\nX e -n A 


n^) 

i=1 


Taking the logarithm of the likelihood function, we get 

n 

In L(A) = rix In A — n\ — In 

2 = 1 
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Therefore 


^l„L(A) = 


l 

— nx — n. 
A 


Setting this derivative to zero and solving for A, we get 


A = x. 


The second derivative test assures us that the above A is a maximum. Hence, 
the maximum likelihood estimator of A is the sample mean X. Next, we 
show that X is sufficient, by using the Factorization Theorem of Fisher and 
Neyman. We factor the joint density of the sample as 


m = 


ynx g—n\ 


riw) 

i =1 


= [\ nx e~ nX 


nw) 

2=1 


= A) h(x i,x 2 , ...,x n ). 


Therefore, the estimator X is a sufficient estimator of A. 

Example 16.18. Let X -\, X< 2 ,..., X n be a random sample from a normal 
distribution with density function 


/(a^) = -A=e-^) 2 , 

where — oo < /z < oo is a parameter. Find the maximum likelihood estimator 
of /z and show that the maximum likelihood estimator of n is a sufficient 
estimator. 

Answer: We know that the maximum likelihood estimator of pt is the sample 
mean X. Next, we show that this maximum likelihood estimator A is a 



Criteria for Evaluating the Goodness of Estimators 


482 


sufficient estimator of fi. The joint density of the sample is given by 
f (xi , 3^2? AO 

n 

= ftfiw) 

i= 1 

n 1 

= TT e -H x i-v) 2 

n 

- n) 2 


V2^ 




1 


— £c) + (a; — /z )] 2 


-fX K Xi " x ) 2 + 2 ( a; i - ®)(* - m) + (a; - /i ) 2 


-§X K Xi _ x ) 2 + ( x ~ ^) 2 ] 

o i= 1 


-f (x-n ) 2 




e 2 


i=sl 


Hence, by the Factorization Theorem, V is a sufficient estimator of the pop¬ 
ulation mean. 


Note that the probability density function of the Example 16.17 which 


is 


/(*; A) = 


0 


ff— if x = 0,1,2, ...,oo 
elsewhere , 


can be written as 


/(x;A) = e l xlnA - lnx! - A } 

for x = 0,1, 2,... This density function is of the form 
f(x;X) = e { if ( x ) A ( A )+ s ( x )+ B ( A )}. 

Similarly, the probability density function of the Example 16.12, which is 

f(: J= e -^) 2 
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can also be written as 


f(x; n) = 

This probability density function is of the form 

f(x-,n) = e {- K '( a 0 A (M)+S(®)+B(M)} - 

We have also seen that in both the examples, the sufficient estimators were 

n 

the sample mean X, which can be written as ^ 5> 

i=l 

Our next theorem gives a general result in this direction. The following 
theorem is known as the Pitman-Koopman theorem. 

Theorem 16.4. Let Xi,X 2 , ■ ■■,X n be a random sample from a distribution 
with probability density function of the exponential form 

f( x . 0 ) = e {K{x)A(6)+S(x)+B(8)} 


n 


on a support free of 9. 


Then the statistic ^^K(Xi) is a sufficient statistic 

»=1 


for the parameter 9. 

Proof: The joint density of the sample is 


f(x i, x 2 ,.0) = 

i= 1 
n 

= rT e {K( Xi )A(0)+S( Xi )+B(0)} 
i =1 

( n n 

\ J2 K -(xi)A(9) + + n B(9) 


= e 


Y J K{x l )A{9) + nB{9) 




n 

Hence by the Factorization Theorem the estimator K(Xi ) is a sufficient 

i =1 

statistic for the parameter 9. This completes the proof. 
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Example 16.19. Let Xi, X 2 ,X n be a random sample from a distribution 
with density function 


f{x\9) 


9x e 1 for 0 < x < 1 


[ 0 otherwise, 

where 9 > 0 is a parameter. Using the Pitman-Koopman Theorem find a 
sufficient estimator of 9. 

Answer: The Pitman-Koopman Theorem says that if the probability density 
function can be expressed in the form of 

fl x ■ 9) = e {K(x)A(6)+S(x) + B(e)} 


then Xu=i K(Xi) is a sufficient statistic for 9. The given population density 
can be written as 

f(x;9) = 9x e ~ 1 

= e {ln[<"- 1 ] 

_ g{ln0+(0— 1) lnx} 

Thus, 

K (x) = In x A{9) = 9 

S (x) = — In x B(9)= In 9. 

Hence by Pitman-Koopman Theorem, 

n n 

Y J K(X i ) = Y, toXi 

i= 1 i= 1 

n 

= hr n Xi ■ 

i=1 

Thus In Xi is a sufficient statistic for 0. 

n 

Remark 16.1. Notice that ru is also a sufficient statistic of 0 , since 

i= 1 

( n \ n 

X, ), we also know IT 

i—1 J i=1 

Example 16.20. Let Xi, X 2 ,..., X n be a random sample from a distribution 
with density function 

| e~ ? for 0 < x < 00 


f(x\9) = 


0 


otherwise. 
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where 0 < 0 < oo is a parameter. Find a sufficient estimator of 0. 


Answer: First, we rewrite the population density in the exponential form. 
That is 

f(x; 9) = 

= e ln [^~ f l 

= e- ,n0 ~ f. 

Hence 

K(x)=x A{9) = ~ 

S(x) = 0 B(9) = -\n9. 

Hence by Pitman-Koopman Theorem, 


Y J K{X i ) = Y,X i =nX. 

i= 1 »=1 


Thus, nX is a sufficient statistic for 9. Since knowing nX, we also know X, 
the estimator X is also a sufficient estimator of 9. 


Example 16.21. Let Xi, X- 2 -..., X n be a random sample from a distribution 
with density function 



where — oo < 0 < oo is a parameter, 
used to find a sufficient statistic for 91 


for 9 < x < oo 
otherwise, 

Can Pitman-Koopman Theorem be 


Answer: No. We can not use Pitman-Koopman Theorem to find a sufficient 
statistic for 9 since the domain where the population density is nonzero is 
not free of 9. 

Next, we present the connection between the maximum likelihood esti¬ 
mator and the sufficient estimator. If there is a sufficient estimator for the 
parameter 9 and if the maximum likelihood estimator of this 9 is unique, then 
the maximum likelihood estimator is a function of the sufficient estimator. 
That is 

9ml = ip{Os), 

where ip is a real valued function, 0 ml is the maximum likelihood estimator 
of 0, and 0s is the sufficient estimator of 0. 
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Similarly, a connection can be established between the uniform minimum 
variance unbiased estimator and the sufficient estimator of a parameter 9. If 
there is a sufficient estimator for the parameter 9 and if the uniform minimum 
variance unbiased estimator of this 9 is unique, then the uniform minimum 
variance unbiased estimator is a function of the sufficient estimator. That is 

0mvue = v(@s), 

where is a real valued function, 0 mvue is the uniform minimum variance 
unbiased estimator of 9, and 9g is the sufficient estimator of 9. 

Finally, we may ask “If there are sufficient estimators, why are not there 
necessary estimators?” In fact, there are. Dynkin (1951) gave the following 
definition. 

Definition 16.7. An estimator is said to be a necessary estimator if it can 
be written as a function of every sufficient estimators. 

16.5. Consistent Estimator 


Let X \, X- 2 ,..., X n be a random sample from a population X with density 
f(x: 9). Let 9 be an estimator of 9 based on the sample of size n. Obviously 
the estimator depends on the sample size n. In order to reflect the depen¬ 
dency of 9 on n, we denote 9 as 9 n . 


Definition 16.7. Let X\,X%, ...,X n be a random sample from a population 
X with density f(x;9). A sequence of estimators {9 n } of 9 is said to be 
consistent for 9 if and only if the sequence { 9 n } converges in probability to 
0, that is, for any e > 0 


lim P 

n—>■ oo 




= o. 


Note that consistency is actually a concept relating to a sequence of 
estimators {9 n }%L no but we usually say “consistency of 9 n ” for simplicity. 
Further, consistency is a large sample property of an estimator. 

The following theorem states that if the mean squared error goes to zero 
as n goes to infinity, then {9 n } converges in probability to 9. 

Theorem 16.5. Let Xl, X 2 ,..., X n be a random sample from a population 
X with density fix: 9) and { 9 n } be a sequence of estimators of 9 based on 
the sample. If the variance of 9 n exists for each n and is finite and 
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then, for any e > 0, 

lim P ( 9 n — 0 > e') = 0. 

n —»-oo \ / 

Proof: By Markov Inequality (see Theorem 13.8) we have 



for all e > 0. Since the events 

(o n — 9^ > e 2 and \6 n — 6\ > e 
are same, we see that 


/ 2 x E[(e n -e) ) 

P {(o n -e) > e 2 ^j = P (\9 n - e\ > e) < A _ - L 

for all n £ N. Hence if 

lim E ^ [d n — 9^j ^ = 0 

then 

lim P (\9 n — 9\ > e) =0 
and the proof of the theorem is complete. 

Let 

b (o, e)=E(e\-9 

be the biased. If an estimator is unbiased, then B ^00 = 0. Next we show 
that 

S ((0-6») 2 ) = Par (?) + B(0,0) \ (1) 

To see this consider 

E^(e-ey^j = e (jd 2 -29e + e 2 y^j 
= e ('ey - 2 e (0) e + e 2 

= £7 0 ) - E (o) 2 + E (e) 2 -2E (e) 9 + 9 2 
= Var 0 + E 0 2 - 2E 0 9 + 9 2 
= Var 0 + £70-0 
= Var 0 + B (0,0\ 2 . 
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In view of (1), we can say that if 


lim V ar ( 9 n ) = 0 


lim B 


(o n ,e) =i 


lim E 


(e n -ey 


In other words, to show a sequence of estimators is consistent we have to 
verify the limits (2) and (3). 

Example 16.22. Let X -\, X'h ..., X n be a random sample from a normal 
population X with mean /i and variance a 2 > 0. Is the likelihood estimator 


= r H(v.-x) 2 . 


of a 2 a consistent estimator of a 2 ? 

Answer: Since a' 2 depends on the sample size n, we denote a 2 as u 2 n . Hence 


= ;;E( x -- a ') 2 - 


The variance of a' 2 n is given by 


Var (a 2 n )=Var(lY,(X i -X) 2 \ 


= —^Var 
n z 


= Var 
n 2 


(n - 1 )S 5 


(n — 1)5 Z 


= < ^2 Var U 2 ^- 1 )) 

2 (n — 1 )ct 4 


= ‘d 2.* 
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Hence 


lim Var 

n —>-00 



'1 1' 

(On) = I™ 


V / n —►oo 

n n 2 


2 <7 4 = 0. 


The biased B (o n , 9^ is given by 


B 


(0 n , 0)=E (£„) 


= * LE (*-*) 2 — 


n 


o* 


— a 


= a -E( X \n^ l))-a 2 
(n — 1 )ct 2 2 


n 


a 

n 


Thus 


lim B (o n ,(i) = — lim — = 0. 

n—>oo \ J n —►oo fi 


n 

Hence (X-i — X'j is a consistent estimator of a 2 . 

i=l 

In the last example we saw that the likelihood estimator of variance is a 
consistent estimator. In general, if the density function f{x\ 6) of a population 
satisfies some mild conditions, then the maximum likelihood estimator of 6 is 
consistent. Similarly, if the density function f(x\ 6) of a population satisfies 
some mild conditions, then the estimator obtained by moment method is also 
consistent. 


Let Xi, X 2 ,X n be a random sample from a population X with density 
function where 9 is a parameter. One can generate a consistent 

estimator using moment method as follows. First, find a function U(x) such 
that 

E(U(X)) = g{9) 

where g is a function of 9. If g is a one-to-one function of 9 and has a 
continuous inverse, then the estimator 

9 M m=9~ 1 
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is consistent for 9. To see this, by law of large number, we get 


71 f 


Hence 


1 n 

J2U(X t )^g(9) 


and therefore 




9mm 9 


and 9mm is a consistent estimator of 9. 


Example 16.23. Let Xi, X 2 ,..., X n be a random sample from a distribution 
with density function 


/ O ; 9) = 


jj e « forO<a;<oo 


otherwise, 


where 0 < 9 < 00 is a parameter. Using moment method find a consistent 
estimator of 9. 

Answer: Let U[x) = x. Then 

m = E{U{X)) = 9. 

The function f(x) = x for x > 0 is a one-to-one function and continuous. 
Moreover, the inverse of / is given by f~ 1 {x ) = x. Thus 


On = r 1 -Ewi 


= r 1 (-T, x i 

\ n U 

= r\x) 


Therefore, 


e„ = x 
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is a consistent estimator of 9. 

Since consistency is a large sample property of an estimator, some statis¬ 
ticians suggest that consistency should not be used alone for judging the 
goodness of an estimator; rather it should be used along with other criteria. 

16.6. Review Exercises 

1. Let Tj and T 2 be estimators of a population parameter 9 based upon the 
same random sample. If Tj ~ N {9 1 erf) i = 1,2 and if T = &Tj + (1 — b)T 2 , 
then for what value of 6, T is a minimum variance unbiased estimator of 9 1 

2. Let X-]. X 2 , ...,X„ be a random sample from a distribution with density 
function 

1 | x | 

fi. x \0)=— e ~~ -00 <x < 00 , 

where 0 < 8 is a parameter. What is the expected value of the maximum 
likelihood estimator of 8 1 Is this estimator unbiased? 

3. Let Ad,X 2 , ...,X n be a random sample from a distribution with density 
function 

1 .. | oe | 

f{x\8) =— e ~^ — 00 < x < 00 , 

where 0 < 9 is a parameter. Is the maximum likelihood estimator an efficient 
estimator of 9 ? 

4. A random sample Ad, X 2 ,..., X n of size n is selected from a normal dis¬ 
tribution with variance a 2 . Let S 2 be the unbiased estimator of ex 2 , and T 
be the maximum likelihood estimator of a 2 . If 20 T — 19S 12 = 0, then what is 
the sample size? 

5. Suppose X and Y are independent random variables each with density 
function 

f 2 x 9 2 for 0 < a: < i 

fix) = 

{0 otherwise. 

If k (X + 2 Y) is an unbiased estimator of 9" 1 , then what is the value of k? 

6 . An object of length c is measured by two persons using the same in¬ 
strument. The instrument error has a normal distribution with mean 0 and 
variance 1. The first person measures the object 25 times, and the average 
of the measurements is X = 12. The second person measures the objects 36 
times, and the average of the measurements is Y = 12.8. To estimate c we 
use the weighted average aX + bY as an estimator. Determine the constants 
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a and b such that aX + bY is the minimum variance unbiased estimator of 
c and then calculate the minimum variance unbiased estimate of c. 

7. Let X -[, X 2 , X n be a random sample from a distribution with probabil¬ 
ity density function 


{ 3 9 x 2 e 9 x3 for 0 < x < 00 
fix) = l 

\ 0 otherwise, 

where 9 > 0 is an unknown parameter. Find a sufficient statistics for 9. 

8. Let X -\, X- 2 , X n be a random sample from a Weibull distribution with 
probability density function 


fix) 


fp 1 e (»^ if x > 0 

0 otherwise , 


where 9 > 0 and /3 > 0 are parameters. Find a sufficient statistics for 9 
with (3 known, say (3 = 2. If /5 is unknown, can you find a single sufficient 
statistics for 9 ? 


9. Let Xi, X 2 be a random sample of size 2 from population with probability 
density 


fix; 0) 


| e » ifO<a:<oo 
0 otherwise, 


where 9 > 0 is an unknown parameter. If Y = then what should 

be the value of the constant k such that kY is an unbiased estimator of the 
parameter 9 ? 


10 . Let Xi, X 2 , ■■■■, X n be a random sample from a population with proba¬ 
bility density function 


fix; 0) 


| if 0 < x < 9 

0 otherwise , 


where 9 > 0 is an unknown parameter. If X denotes the sample mean, then 
what should be value of the constant k such that kX is an unbiased estimator 
of 9 ? 
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11. Let X -[, X 2 ...., X n be a random sample from a population with proba¬ 
bility density function 


f(x\ 0 ) 


| if 0 < x < 9 

0 otherwise , 


where 9 > 0 is an unknown parameter. If X rne( j denotes the sample median, 
then what should be value of the constant k such that kX me d is an unbiased 
estimator of 9 1 


12. What do you understand by an unbiased estimator of a parameter 91 
What is the basic principle of the maximum likelihood estimation of a param¬ 
eter 91 What is the basic principle of the Bayesian estimation of a parame¬ 
ter 91 What is the main difference between Bayesian method and likelihood 
method. 


13. Let X\, X' 2 ,..., X n be a random sample from a population X with density 
function 


f(x; 9) 


e 

(1 

0 


for 0 < x < oo 
otherwise, 


where 9 > 0 is an unknown parameter. What is a sufficient statistic for the 
parameter 91 


14. Let X \, X 2 ,..., X n be a random sample from a population X with density 
function 


f{x\9) 


x 2 

e~ m 3 for 0 < x < 00 
0 otherwise, 


where 9 is an unknown parameter. What is a sufficient statistic for the 
parameter 91 


15. Let Xi,X 2 , ■■■,X n be a random sample from a distribution with density 
function 


f(x; 9) 


e (» s ) f or 9 < x < 00 
0 otherwise, 


where —00 < 9 < 00 is a parameter. What is the maximum likelihood 
estimator of 91 Find a sufficient statistics of the parameter 9. 
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16. Let Xi, X 2 , ...,X n be a random sample from a distribution with density 
function 


fix; 9) 


e (» s ) f or 0 < x < 00 
0 otherwise, 


where — 00 < 9 < 00 is a parameter. Are the estimators X^ and X — 1 are 
unbiased estimators of 91 Which one is more efficient than the other? 


17. Let X\, X 2 ,..., X n be a random sample from a population X with density 
function 


fix; 9) 


9 x s 1 for 0 < x < 1 
0 otherwise, 


where 9 > 1 is an unknown parameter. What is a sufficient statistic for the 
parameter 91 


18. Let X\, X' 2 ,..., X n be a random sample from a population X with density 
function 


fix ; 9) 


9ux a 1 e 9xa for 0 < x < 00 
0 otherwise, 


where 9 > 0 and a > 0 are parameters. What is a sufficient statistic for the 
parameter 9 for a fixed a! 


19. Let X\, X 2 ,..., X n be a random sample from a population X with density 


function 


fix; 9) 


0 oc u 


for a < x < 00 


0 otherwise, 


where 9 > 0 and a > 0 are parameters. What is a sufficient statistic for the 
parameter 9 for a fixed a! 


20 . Let X\, X 2 ,..., X n be a random sample from a population X with density 
function 


fix; 9) 


(™)9 x {l-9) m - x for x = 0,1,2,..., to 

0 otherwise, 


where 0 < 9 < 1 is parameter. Show that ^ is a uniform minimum variance 
unbiased estimator of 9 for a fixed m. 
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21. Let X\, A' 2 ,..., X n be a random sample from a population X with density 
function 

( Ox 0-1 for 0 < x < 1 


otherwise, 


where 9 > 1 is parameter. Show that — A ]P" =1 ln(Xj) is a uniform minimum 
variance unbiased estimator of A. 


22. Let Xi,X 2 ,...,X„ be a random sample from a uniform population X 
on the interval [0, 9 ], where 9 > 0 is a parameter. Is the likelihood estimator 
9 = X(„) of 9 a consistent estimator of 9? 

23. Let Xi,X 2 , ...,X„ be a random sample from a population X ~ POI{ A), 
where A > 0 is a parameter. Is the estimator X of A a consistent estimator 
of A? 


24. Let Xi,X 2 , ...,X„ be a random sample from a population X having the 
probability density function 


fix; 9) 


9 x 8 1 , if 0 < a; < 1 
0 otherwise, 


where 9 > 0 is a parameter. Is the estimator 9 = jX= of 9 , obtained by the 
moment method, a consistent estimator of 9 ? 


25. Let Xi,X 2 , ...,X„ be a random sample from a population X having the 
probability density function 

f if a = 0,1,2,...,TO 

f(x;p) = < 

[ 0 otherwise, 

where 0 < p < 1 is a parameter and to is a fixed positive integer. What is the 
maximum likelihood estimator for p. Is this maximum likelihood estimator 
for p is an efficient estimator? 


26. Let Xi.Xa, ...,X„ be a random sample from a population X having the 
probability density function 


fix; 9) 


9 x e ~\ if 0 < 2 : < 1 
0 otherwise, 


where 9 > 0 is a parameter. Is the estimator 9 = of 9 , obtained by the 
moment method, a consistent estimator of 9 ? Justify your answer. 
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Chapter 17 

SOME TECHNIQUES 

FOR FINDING INTERVAL 
ESTIMATORS 
FOR 

PARAMETERS 


In point estimation we find a value for the parameter 9 given a sample 
data. For example, if X -\, X%, . X n is a random sample of size n from a 
population with probability density function 


f(x\ 9) = 



then the likelihood function of 9 is 


for x > 9 
otherwise, 


m = n 

i= 1 



where X\ > 9, x 2 > 0 , ^ 0- This likelihood function simplifies to 


L{9) 



i=l 


where minjxi, X 2 , •••, x n } > 9. Taking the natural logarithm of L(9 ) and 
maximizing, we obtain the maximum likelihood estimator of 9 as the first 
order statistic of the sample X \, X 2 ■ ...,X n , that is 


' = X 


(i) 
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where = min{Vi, X 2 , X n }. Suppose the true value of 9 = 1. Using 
the maximum likelihood estimator of 9 , we are trying to guess this value of 
9 based on a random sample. Suppose X 1 = 1.5, X 2 = 1.1, X 3 = 1.7, X 4 = 

2.1, X .5 = 3.1 is a set of sample data from the above population. Then based 
on this random sample, we will get 

9ml = V(i) = min{1.5,1.1,1.7, 2.1,3.1} = 1.1. 

If we take another random sample, say Xi = 1.8, X 2 = 2.1, X 3 = 2.5, X 4 = 

3.1, X 5 = 2.6 then the maximum likelihood estimator of this 9 will be 9 = 1.8 
based on this sample. The graph of the density function f(x ; 9) for 9 = 1 is 
shown below. 


Graph of the Density Function 



From the graph, it is clear that a number close to 1 has higher chance of 
getting randomly picked by the sampling process, then the numbers that are 
substantially bigger than 1. Hence, it makes sense that 9 should be estimated 
by the smallest sample value. However, from this example we see that the 
point estimate of 9 is not equal to the true value of 9. Even if we take many 
random samples, yet the estimate of 9 will rarely equal the actual value of 
the parameter. Hence, instead of finding a single value for 9 , we should 
report a range of probable values for the parameter 9 with certain degree of 
confidence. This brings us to the notion of confidence interval of a parameter. 

17.1. Interval Estimators and Confidence Intervals for Parameters 

The interval estimation problem can be stated as follow: Given a random 
sample X -\, X 2 ,..., X n and a probability value 1 — a, find a pair of statistics 
L = L(Xi,X 2 , ...,X n ) and U = U(X i,X 2 , ...,X„) with L <U such that the 
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probability of 9 being on the random interval [L, U] is 1 — a. That is 

P(L < 6 < U) = 1 - a. 

Recall that a sample is a portion of the population usually chosen by 
method of random sampling and as such it is a set of random variables 
Xi,X 2 ,..., X n with the same probability density function f(x; 9) as the pop¬ 
ulation. Once the sampling is done, we get 

Xi — xi, X 2 — £ 2 , j X n — x n 
where x\, x 2 ,..., x n are the sample data. 

Definition 17.1. Let X\, X 2 , ■•■ i X n be a random sample of size n from 
a population X with density f(x\9 ), where 9 is an unknown parameter. 
The interval estimator of 9 is a pair of statistics L = L(X i,X 2 ,...,X n ) and 
U = U(Xi,X 2 ,..., X n ) with L <U such that if x\,x 2 , ...,x n is a set of sample 
data, then 9 belongs to the interval [L(xi,x 2 , —x n ), U(xi,x 2 , ...£„)]. 

The interval [l, u] will be denoted as an interval estimate of 9 whereas the 
random interval [L, U] will denote the interval estimator of 9. Notice that 
the interval estimator of 9 is the random interval [L, U]. Next, we define the 
100(1 — a)% confidence interval for the unknown parameter 9. 

Definition 17.2. Let Xi, X 2 ,..., X n be a random sample of size n from a 
population X with density f(x\9), where 9 is an unknown parameter. The 
interval estimator of 9 is called a 100(1 — a)% confidence interval for 9 if 

P(L < 6 < U) = 1 - a. 

The random variable L is called the lower confidence limit and U is called the 
upper confidence limit. The number (1 —a) is called the confidence coefficient 
or degree of confidence. 

There are several methods for constructing confidence intervals for an 
unknown parameter 9. Some well known methods are: (1) Pivotal Quantity 
Method, (2) Maximum Likelihood Estimator (MLE) Method, (3) Bayesian 
Method, (4) Invariant Methods, (5) Inversion of Test Statistic Method, and 
(6) The Statistical or General Method. 

In this chapter, we only focus on the pivotal quantity method and the 
MLE method. We also briefly examine the the statistical or general method. 
The pivotal quantity method is mainly due to George Bernard and David 
Fraser of the University of Waterloo, and this method is perhaps one of 
the most elegant methods of constructing confidence intervals for unknown 
parameters. 
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17.2. Pivotal Quantity Method 

In this section, we explain how the notion of pivotal quantity can be 
used to construct confidence interval for a unknown parameter. We will 
also examine how to find pivotal quantities for parameters associated with 
certain probability density functions. We begin with the formal definition of 
the pivotal quantity. 

Definition 17.3. Let Xi, X 2 ,X n be a random sample of size n from a 
population X with probability density function f(x\9), where 9 is an un¬ 
known parameter. A pivotal quantity Q is a function of X±, X- 2 ,..., X n and 9 
whose probability distribution is independent of the parameter 9. 

Notice that the pivotal quantity Q(Xi,X 2 ,..., X n , 9) will usually contain 
both the parameter 9 and an estimator (that is, a statistic) of 9. Now we 
give an example of a pivotal quantity. 

Example 17.1. Let X±, X 2 ,..., X n be a random sample from a normal 
population X with mean p and a known variance a 2 . Find a pivotal quantity 
for the unknown parameter p. 

Answer: Since each Xi ~ N(p 1 a 2 ), 


A ~ N 




Standardizing X, we see that 

^~A(0,1). 

y/n 

The statistics Q given by 

Q(X 1 ,X 2 ,...,X n ,p)= J ^ 

y/n 

is a pivotal quantity since it is a function of X\, X 2 ,..., X n and p and its 
probability density function is free of the parameter p. 

There is no general rule for finding a pivotal quantity (or pivot) for 
a parameter 9 of an arbitrarily given density function f(x: 9). Hence to 
some extents, finding pivots relies on guesswork. However, if the probability 
density function f(x; 9) belongs to the location-scale family, then there is a 
systematic way to find pivots. 
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Definition 17.4. Let g : 1R —> M be a probability density function. Then for 
any p and any er > 0, the family of functions 


T = { f(x; p, a) = -g (-—- 
cr \ cr 


p € (—oo, oo), a € (0, oo) 


is called the location-scale family with standard probability density f(x;9). 
The parameter p is called the location parameter and the parameter o is 
called the scale parameter. If cr = 1, then T is called the location family. If 
p = 0, then T is called the scale family 

It should be noted that each member / (a:; p, a) of the location-scale 
family is a probability density function. If we take g(x) = t then 

the normal density function 


f(x;p,<?) = ~9 


= 1 a( X -^-\= 1 P -H^r 


V2 


—OO < X < oo 


7T(J 


f(x-,6) = 


1 _ x 

e 


belongs to the location-scale family. The density function 

if 0 < x < oo 
0 otherwise, 

belongs to the scale family. However, the density function 

Ox 9 - 1 if 0 < x < 1 


f(x;0) = 


0 


otherwise, 


does not belong to the location-scale family. 

It is relatively easy to find pivotal quantities for location or scale param¬ 
eter when the density function of the population belongs to the location-scale 
family T. When the density function belongs to location family, the pivot 
for the location parameter p is p — p, where p is the maximum likelihood 
estimator of p. If a is the maximum likelihood estimator of cr, then the pivot 
for the scale parameter a is - when the density function belongs to the scale 
family. The pivot for location parameter p is and the pivot for the scale 

<J 

parameter a is - when the density function belongs to location-scale fam¬ 
ily. Sometime it is appropriate to make a minor modification to the pivot 
obtained in this way, such as multiplying by a constant, so that the modified 
pivot will have a known distribution. 
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Remark 17.1. Pivotal quantity can also be constructed using a sufficient 
statistic for the parameter. Suppose T = T(X 1 , X 2 ,..., X n ) is a sufficient 
statistic based on a random sample X - t , X - 2 ,..., X n from a population X with 
probability density function f(x: 9). Let the probability density function of 
T be g(t;9). If g(t;9) belongs to the location family, then an appropriate 
constant multiple of T — a(9) is a pivotal quantity for the location parameter 
9 for some suitable expression a{9). If g(t\ 9) belongs to the scale family, then 
an appropriate constant multiple of is a pivotal quantity for the scale 
parameter 9 for some suitable expression b{9). Similarly, if g(t\ 9) belongs to 
the location-scale family, then an appropriate constant multiple of is 

a pivotal quantity for the location parameter 9 for some suitable expressions 
a(9) and b{9). 

Algebraic manipulations of pivots are key factors in finding confidence 
intervals. If Q = Q{X\,X 2 ,..., X n , 6) is a pivot, then a 100(1— a)% confidence 
interval for 9 may be constructed as follows: First, find two values a and b 
such that 

P(a < Q < b) = 1 — a, 

then convert the inequality a < Q < b into the form L < 9 < U. 

For example, if X is normal population with unknown mean g and known 
variance a 2 , then its pdf belongs to the location-scale family. A pivot for g 
is . However, since the variance cr 2 is known, there is no need to take 
S. So we consider the pivot to construct the 100(1 — 2a)% confidence 
interval for g. Since our population X ~ N(g,,a 2 ), the sample mean X is 
also a normal with the same mean /i and the variance equals to . Hence 


1-2 a = P 



X-n 


<T 



< z, 


= P 

= P 


<7 — a 

fj, z a -j= ^ X ^ i-i T Zq, -j= 
\ n \ n 


G -77— G . 

X z a "p= ^ ^ X z a "p= J . 

\ n \ n , 


Therefore, the 100(1 — 2a)% confidence interval for /i is 



Probability and Mathematical Statistics 


503 


Here z a denotes the 100(1 — a)-percentile (or (1 — oj-quartile) of a standard 
normal random variable Z , that is 


P(Z < Za) = 1 — a, 


where a < 0.5 (see figure below). Note that a = P(Z < —z a ) if a < 0.5. 



A 100(1 — a)% confidence interval for a parameter 9 has the following 
interpretation. If Xi = x\, X 2 = £ 2 , •••, X n = x n is a sample of size n, then 
based on this sample we construct a 100(1 — a)% confidence interval [l, 11 ) 
which is a subinterval of the real line JL Suppose we take large number of 
samples from the underlying population and construct all the corresponding 
100(1 — a)% confidence intervals, then approximately 100(1 — a)% of these 
intervals would include the unknown value of the parameter 9. 

In the next several sections, we illustrate how pivotal quantity method 
can be used to determine confidence intervals for various parameters. 

17.3. Confidence Interval for Population Mean 

At the outset, we use the pivotal quantity method to construct a con¬ 
fidence interval for the mean of a normal population. Here we assume first 
the population variance is known and then variance is unknown. Next, we 
construct the confidence interval for the mean of a population with continu¬ 
ous, symmetric and unimodal probability distribution by applying the central 
limit theorem. 

Let Xi, X 2 , ..., X n be a random sample from a population X ~ N(/i, a 2 ), 
where /./, is an unknown parameter and a 2 is a known parameter. First of all, 
we need a pivotal quantity Q(Xi, X 2 ,..., X n , fi). To construct this pivotal 
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quantity, we find the likelihood estimator of the parameter /i. We know that 
/J = X. Since, each X, ~ iV(/x, cr 2 ), the distribution of the sample mean is 
given by 

It is easy to see that the distribution of the estimator of /r is not independent 
of the parameter [i. If we standardize X, then we get 

0 , 1 ). 

y/n 


The distribution of the standardized X is independent of the parameter /i. 
This standardized X is the pivotal quantity since it is a function of the 
sample X- t . X2,X n and the parameter /i, and its probability distribution 
is independent of the parameter /i. Using this pivotal quantity, we construct 
the confidence interval as follows: 


1 — a = P ( — z°l < ^ ^ — < z 


2 — cr — 2 

y/n 


In 


= p(X-(— z* < n < X + -= z* 


In 


Hence, the (1 — a)% confidence interval for /r when the population X is 
normal with the known variance a 2 is given by 



This says that if samples of size n are taken from a normal population with 
mean /i and known variance cr 2 and if the interval 



is constructed for every sample, then in the long-run 100(1 — a)% of the 
intervals will cover the unknown parameter /r and hence with a confidence of 
(1 — a) 100% we can say that n lies on the interval 
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The interval estimate of /x is found by taking a good (here maximum likeli¬ 
hood) estimator X of /x and adding and subtracting z« times the standard 
deviation of X. 

Remark 17.2. By definition a 100(1 — a)% confidence interval for a param¬ 
eter 9 is an interval [L, U] such that the probability of 9 being in the interval 
[L, U] is 1 — a. That is 


1 - a = P{L < 9 < U). 
One can find infinitely many pairs L, U such that 

1 - a = P(L <9<U). 


Hence, there are infinitely many confidence intervals for a given parameter. 
However, we only consider the confidence interval of shortest length. If a 
confidence interval is constructed by omitting equal tail areas then we obtain 
what is known as the central confidence interval. In a symmetric distribution, 
it can be shown that the central confidence interval is of the shortest length. 


Example 17.2. Let Xi, X 2 ,..., Xu be a random sample of size 11 from 
a normal distribution with unknown mean /x and variance er 2 = 9.9. If 
Si=i x i = 132, then what is the 95% confidence interval for /x ? 


Answer: Since each Xi ~ iV(/x, 9.9), the confidence interval for /x is given 
by 


^ cr 


X — —F= Za , X+[-=\z 


Since J2i=i x * = 132, the sample mean x = = 12. Also, we see that 




= Vos. 


Further, since 1 — a = 0.95, a = 0.05. Thus 


Zf = 20.025 = 1-96 (from normal table). 

Using these information in the expression of the confidence interval for ft, we 
get 

[l2- 1.96V(h9, 12 + 1.96 V 0 I 9 ] 


that is 


[10.141, 13.859], 
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Example 17.3. Let Xi, X 2 ,..., Xu be a random sample of size 11 from 
a normal distribution with unknown mean p, and variance er 2 = 9.9. If 
Ei=i x i = 132, then for what value of the constant k is 

\l2-kV(L9, 12 + fc a/oTq! 


a 90% confidence interval for /r ? 

Answer: The 90% confidence interval for /i when the variance is given is 



Thus we need to find x. 



and corresponding to 1 — a = 0.9. Hence 



= V(19. 

A ).05 = 1-64 (from normal table). 
Hence, the confidence interval for /r at 90% confidence level is 
[l2 - (1.64) VoT9, 12 + (1.64) V 0 T 9 I . 


Comparing this interval with the given interval, we get 


k = 1.64. 


and the corresponding 90% confidence interval is [10.444, 13.556]. 

Remark 17.3. Notice that the length of the 90% confidence interval for p, 
is 3.112. However, the length of the 95% confidence interval is 3.718. Thus 
higher the confidence level bigger is the length of the confidence interval. 
Hence, the confidence level is directly proportional to the length of the confi¬ 
dence interval. In view of this fact, we see that if the confidence level is zero, 
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then the length is also zero. That is when the confidence level is zero, the 
confidence interval of n degenerates into a point X. 

Until now we have considered the case when the population is normal 
with unknown mean fx and known variance a 2 . Now we consider the case 
when the population is non-normal but its probability density function is 
continuous, symmetric and unimodal. If the sample size is large, then by the 
central limit theorem 


^ a ^ ~ N( 0,1) as n —> oo. 

Vn 

Thus, in this case we can take the pivotal quantity to be 
Q(X 1 ,X 2 ,...,X n , f x)= X J ^, 

yjn 

if the sample size is large (generally n > 32). Since the pivotal quantity is 
same as before, we get the sample expression for the (1 — a) 100% confidence 
interval, that is 



Example 17.4. Let Xi, X 2 , ■■■, X 40 be a random sample of size 40 from 
a distribution with known variance and unknown mean /i. If ]+i 10 Xi = 
286.56 and a 2 = 10, then what is the 90 percent confidence interval for the 
population mean /i ? 

Answer: Since 1 — a = 0.90, we get § = 0.05. Hence, 20.05 = 1-64 (from 
the standard normal table). Next, we find the sample mean 


x = 


286.56 

40 


7.164. 


Hence, the confidence interval for /i is given by 


7.164- (1.64) 



7.164+ (1.64) 



that is 


[6.344, 7.984], 
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Example 17.5. In sampling from a nonnormal distribution with a variance 
of 25, how large must the sample size be so that the length of a 95% confidence 
interval for the mean is 1.96 ? 

Answer: The confidence interval when the sample is taken from a normal 
population with a variance of 25 is 



Thus the length of the confidence interval is 

t=2Z9L 
2 

= 2 2:0.025 
= 2(1.96)^. 

But we are given that the length of the confidence interval is £ = 1.96. Thus 

1.96 = 2(1.96) 

Vn= 10 
n = 100. 

Hence, the sample size must be 100 so that the length of the 95% confidence 
interval will be 1.96. 

So far, we have discussed the method of construction of confidence in¬ 
terval for the parameter population mean when the variance is known. It is 
very unlikely that one will know the variance without knowing the popula¬ 
tion mean, and thus what we have treated so far in this section is not very 
realistic. Now we treat case of constructing the confidence interval for pop¬ 
ulation mean when the population variance is also unknown. First of all, we 
begin with the construction of confidence interval assuming the population 
X is normal. 

Suppose X -[, X‘ 2 ,..., X n is random sample from a normal population X 
with mean /i and variance a 2 > 0. Let the sample mean and sample variances 
be X and S 2 respectively. Then 

(n — l )^ 2 

n -2 





X 2 (n-l) 
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and 



JV'(O.l). 


Therefore, the random variable defined by the ratio of (n V s to —-*= has 
a i-distribution with (n — 1) degrees of freedom, that is 


Q(X 1 ,X 2 ,...,X n , f x) 



(n-l)S 2 
(n— l)a 2 



t(n- 1), 


where Q is the pivotal quantity to be used for the construction of the confi¬ 
dence interval for fi. Using this pivotal quantity, we construct the confidence 
interval as follows: 


1 - a = P ( -t | (n — 1) < X s M < tf (n - 1) 


^ f S 


^ ( S 


= P[X-( )t«(n-l)<n<X+ 1) 


In 


Hence, the 100(1 — a)% confidence interval for /i when the population X is 
normal with the unknown variance cr 2 is given by 



f f (n-l), A + 




Example 17.6. A random sample of 9 observations from a normal popula¬ 
tion yields the observed statistics x = 5 and g ^2 i=1 (xi — x) 2 = 36. What is 
the 95% confidence interval for /r ? 

Answer: Since 

n = 9 x = 5 

s 2 = 36 and 1 — a = 0.95, 
the 95% confidence interval for /r is given by 

that is 

5 “ ( V^) i0 ° 25(8) ’ 5 + (^) i0 ° 25(8) ’ 
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which is 


5 -(A) (2 . 306) , 


5 + 



(2.306) 


Hence, the 95% confidence interval for /j, is given by [0.388, 9.612]. 


Example 17.7. Which of the following is true of a 95% confidence interval 
for the mean of a population? 

(a) The interval includes 95% of the population values on the average. 

(b) The interval includes 95% of the sample values on the average. 

(c) The interval has 95% chance of including the sample mean. 

Answer: None of the statements is correct since the 95% confidence inter¬ 
val for the population mean /i means that the interval has 95% chance of 
including the population mean /r. 

Finally, we consider the case when the population is non-normal but 
it probability density function is continuous, symmetric and unimodal. If 
some weak conditions are satisfied, then the sample variance S 2 of a random 
sample of size n > 2, converges stochastically to a 2 . Therefore, in 


X — fj, 

X-n 


(n—l)S 2 

(n—l)a 2 


the numerator of the left-hand member converges to N( 0,1) and the denom¬ 
inator of that member converges to 1. Hence 


X-n 



N( 0,1) 


as 


n 


oo. 


This fact can be used for the construction of a confidence interval for pop¬ 
ulation mean when variance is unknown and the population distribution is 
nonnormal. We let the pivotal quantity to be 


Q(Xi,X 2 , ..., X n ,n) 


X — n 



and obtain the following confidence interval 
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We summarize the results of this section by the following table. 


Population 

Variance a 2 

Sample Size n 

Confidence Limits 

normal 

known 

n > 2 

XT z°l 

1 2 y/n 

normal 

not known 

n > 2 

x ■ /.'< (?/ 1 j 

1 2 v 7 vn 

not normal 

known 

n > 32 

xt z« X= 

1 2 Vn 

not normal 

known 

n < 32 

no formula exists 

not normal 

not known 

n > 32 

x T ts.(n — 1 ) - 7 = 

not normal 

not known 

n < 32 

no formula exists 


17.4. Confidence Interval for Population Variance 

In this section, we will first describe the method for constructing the 
confidence interval for variance when the population is normal with a known 
population mean /i. Then we treat the case when the population mean is 
also unknown. 

Let X -\, X ^,..., X n be a random sample from a normal population X 
with known mean /i and unknown variance er 2 . We would like to construct 
a 100(1 — a)% confidence interval for the variance er 2 , that is, we would like 
to find the estimate of L and U such that 

P (L < a 2 < U) = 1 - a. 

To find these estimate of L and U, we first construct a pivotal quantity. Thus 


X^N^a 2 ), 



We define the pivotal quantity Q{X \,X%, ...,X n ,a 2 ) as 
Q(X 1 ,X 2 ,... } X n ,a 2 ) = 
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which has a chi-square distribution with n degrees of freedom. Hence 
1 — a = P(a<Q<b) 


= P 



= P 

= P 

= P 

= P 


fl ^ a 2 l\ 

v ° ~ ( x *~^) 2 - b ) 


V X?_f( n ) 


> CT 2 > 

< cr 2 < 

< a 2 < 


EEi(^~m) 2 ^ 

EEi(^~m) 2 ^ 

ELi( x > - m) 2 \ 

x|M ) 


Therefore, the (1 — a)% confidence interval for cr 2 when mean is known is 
given by 

EEi(^-m ) 2 EIIi^m) 2 ' 

X?_§ 0) ’ X%(n) 


Example 17.8. A random sample of 9 observations from a normal pop¬ 
ulation with = 5 yields the observed statistics | Ei=i x i = 39.125 and 
E i= i x i = 45. What is the 95% confidence interval for cr 2 ? 

Answer: We have been given that 

n = 9 and ji = 5. 


Further we know that 

9 

y~] Xi = 45 

i=1 


and 


1 9 

39.125. 

8 z -' 

i —1 


Hence 


and 


9 

5 >? = 313 ’ 
2=1 


- m) 2 = ^2 x 2 i - + 9/i 2 

2=1 2=1 2=1 


= 313 - 450 + 225 
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Since 1 — a = 0.95, we get “ = 0.025 and 1 — f = 0.975. Using chi-square 
table we have 

Xo.o 25(9) = 2.700 and Xo.97s(9) = 19.02. 

Hence, the 95% confidence interval for a 2 is given by 

EEi(^m) 2 ' 

X?_f( n ) ’ x| (n) \ ’ 

that is 

'88 88 ' 

19.02’ 2/7 

which is 

[4.63, 32.59]. 


Remark 17.4. Since the % 2 distribution is not symmetric, the above confi¬ 
dence interval is not necessarily the shortest. Later, in the next section, we 
describe how one construct a confidence interval of shortest length. 

Consider a random sample X\, X 2 ,..., X n from a normal population 
X ~ fV(/r,CT 2 ), where the population mean /./, and population variance tr 2 
are unknown. We want to construct a 100(1 — a)% confidence interval for 
the population variance. We know that 


(n - l)^ 2 


X 2 (n- 1 ) 
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Hence, the 100(1 — a)% confidence interval for variance a 2 when the popu¬ 
lation mean is unknown is given by 


EIU (x.-x) 2 Y?i=dXi-xY 

X?_o(n-l) ’ X“ ( n — 1) 


Example 17.9. Let Xi, X 2 ,..., X n be a random sample of size 13 from a 
normal distribution N(fi,a 2 ). If Yid=i x i = 246.61 and i x i = 4806.61. 
Find the 90% confidence interval for a 2 ? 


Answer: 


x = 18.97 


= - T X ( Xi ~ 

11 — 1 

1=1 

= E i x i ~ n5 


= — [4806.61 - 4678.2] 

= 1 128.41. 

12 

Hence, 12s 2 = 128.41. Further, since l — a = 0.90, we get f = 0.05 and 
1 — j = 0.95. Therefore, from chi-square table, we get 


Xo. 95 ( 12 ) = 21.03, 


Xo.o 5 (12) = 5-23. 


Hence, the 95% confidence interval for a 2 is 


128.41 128.41' 

21.03 ’ 5.23 ’ 


that is 


[6.11, 24.55], 


Example 17.10. Let X\, X 2 ,..., X n be a random sample of size n from a 
distribution IV (/qc 2 ), where fi and a 2 are unknown parameters. What is 
the shortest 90% confidence interval for the standard deviation a ? 

Answer: Let S 2 be the sample variance. Then 


(n - l)^ 2 


X 2 (n- 1 ). 
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Using this random variable as a pivot, we can construct a 100(1 — a)% con¬ 
fidence interval for a from 


1 — a = P 



(n — 1)5 2 

/■j-2 



by suitably choosing the constants a and b. Hence, the confidence interval 
for a is given by 

(n- l)S 2 /(n- ljS 2 ’ 

b ’ V a ' 

The length of this confidence interval is given by 
L(a,b) = SVr^l 

In order to find the shortest confidence interval, we should find a pair of 
constants a and b such that L(a, b) is minimum. Thus, we have a constraint 
minimization problem. That is 


1 1 

\fa yjb 



Minimize L(a, b) 

Subject to the condition 

f b 

/ f(u)du = 1 — a, 

J a 


where 


f{x) = 


r (htO 2^ 


— x 2 e 2 . 


Differentiating L with respect to a, we get 


dL 


da 


1 


db 


LbJU /-— / ± >5 i _ o KJjU 

— = SV^l -ra 5 + - b —2 — 


da 


(MP) 


From 


/ f(u) du = 1 — a, 

J a 

we find the derivative of b with respect to a as follows: 

j- [ f(u)du= -^-(1-a) 
aa J a da 


Kb) 


db 

da 


/(a) = 0. 


that is 
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Thus, we have 


db = f(a) 
da f(b)' 

Letting this into the expression for the derivative of L, we get 


dL 

da 


= S\/n — 1 ( —a 

1 2 


Setting this derivative to zero, we get 



SVn — 1 




f(a) \ 

m) 


= 0 


which yields 

a 1 /(a) = bi f(b). 


Using the form of /, we get from the above expression 


3 n-3 _o _ 3 _ n — 3 _b 

a 2 a 2 e 2 = b 2 b 2 e 2 


that is 


a 2 e 2 


= b 2 


From this we get 


ln © = 


a — b 


Hence to obtain the pair of constants a and b that will produce the shortest 
confidence interval for cr, we have to solve the following system of nonlinear 
equations 


(*) 


If a 0 and b 0 are solutions of (*), then the shortest confidence interval for a 
is given by 

r l(n-l)S 2 l(n— l)S 2 



Since this system of nonlinear equations is hard to solve analytically, nu¬ 
merical solutions are given in statistical literature in the form of a table for 
finding the shortest interval for the variance. 
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17.5. Confidence Interval for Parameter of some Distributions 
not belonging to the Location-Scale Family 

In this section, we illustrate the pivotal quantity method for finding 
confidence intervals for a parameter 9 when the density function does not 
belong to the location-scale family. The following density functions does not 
belong to the location-scale family: 


f(x; 0) = 


if 0 < x < 1 


otherwise, 


f(x\ 0) = 


if 0 < x < 8 


otherwise. 


We will construct interval estimators for the parameters in these density 
functions. The same idea for finding the interval estimators can be used to 
find interval estimators for parameters of density functions that belong to 
the location-scale family such as 


f{x\ 9) = 


if 0 < x < oo 


otherwise. 


To find the pivotal quantities for the above mentioned distributions and 
others we need the following three results. The first result is Theorem 6.2 
while the proof of the second result is easy and we leave it to the reader. 

Theorem 17.1. Let F(x\9 ) be the cumulative distribution function of a 
continuous random variable X. Then 

F(X-9) ~ UNIF(0, 1). 

Theorem 17.2. If X ~ UNIF{ 0,1), then 

— InX ~ EXP( 1). 


Theorem 17.3. Let Xi,X 2 ,..., X n be a random sample from a distribution 
with density function 


. fix ; 0) = 


if 0 < x < oo 


otherwise, 
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where 6 > 0 is a parameter. Then the random variable 

9 n 

- ~ X 2 (2n) 

U i =1 

Proof: Let Y = 1 X]"=i -^9- Now we show that the sampling distribution of 
y is chi-square with 2n degrees of freedom. We use the moment generating 
method to show this. The moment generating function of Y is given by 

M Y (t) = 


_ 2n 

Since (1 — 2 1) 2 corresponds to 

square random variable with 2 n degrees of freedom, we conclude that 

9 " 

q ^2 X i ~ X 2 (2n). 

U i=l 

Theorem 17.4. Let X 1} X 2 , ...,X n be a random sample from a distribution 
with density function 


M n (t) 

sE x ‘ 


i=1 


-j 


i= 1 


n>-«‘ 


e 

(1 - 2t) _n 

In 

( 1 - 2 1 ) 2 . 


-l 


the moment generating function of a chi- 


f(x; 0) 


9x 9 1 if 0 < x < 1 
0 otherwise, 


where 0 > 0 is a parameter. Then the random variable —26 ^(' = i In X t has 
a chi-square distribution with 2 n degree of freedoms. 

Proof: We are given that 


Xi ~ Ox 6 - 1 , 


0 < x < 1. 
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Hence, the cdf of / is 

F(x-0)= f 9x e ~ 1 dx = x e . 

Jo 

Thus by Theorem 17.1, each 


F(X i -9)~UNIF( 0,1), 

that is 

X? ~ UNIF( 0,1). 

By Theorem 17.2, each 

— In Xf ~ EXP{ 1), 

that is 

—8 In Xi ~ EXP) 1). 

By Theorem 17.3 (with 9 = 1), we obtain 

n 

-2 8 In Xi ~ x 2 (2n). 
i=l 

Hence, the sampling distribution of —2 9 YJa-i 1° Xi is chi-square with 2n 
degree of freedoms. 

The following theorem whose proof follows from Theorems 17.1, 17.2 and 
17.3 is the key to finding pivotal quantity of many distributions that do not 
belong to the location-scale family. Further, this theorem can also be used 
for finding the pivotal quantities for parameters of some distributions that 
belong the location-scale family. 

Theorem 17.5. Let X\,X 2 ,..., X n be a random sample from a continuous 
population X with a distribution function F)x\ 8). If F{x\ 9) is monotone in 
8, then the statistic Q = —2^" =1 ln F{Xi\9) is a pivotal quantity and has 
a chi-square distribution with 2n degrees of freedom (that is, Q ~ % 2 (2n)). 

It should be noted that the condition F(x\ 9) is monotone in 9 is needed 
to ensure an interval. Otherwise we may get a confidence region instead of a 
confidence interval. Further note that the statistic —2 i 0-~ E (X,: 9)) 
is also has a chi-square distribution with 2 n degrees of freedom, that is 

n 

-2]Tln(l —F(X i; 0 ))~ X 2 (2n). 

i— 1 
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Example 17.11. 

with density 


If X- t . X 2 ,X n is a random sample from a population 


f{x; 0) 


9x e 1 if 0 < x < 1 
0 otherwise, 


where 9 > 0 is an unknown parameter, what is a 100(1 — a)% confidence 
interval for 91 


Answer: To construct a confidence interval for 0, we need a pivotal quantity. 
That is, we need a random variable which is a function of the sample and the 
parameter, and whose probability distribution is known but does not involve 
9. We use the random variable 

n 

Q=-2 9 ^lnW ~x 2 (2n) 

i—l 

as the pivotal quantity. The 100(1 — a)% confidence interval for 9 can be 
constructed from 

l-a = p(x\ (2n) < Q < xL§ (2n)) 

= P ( x|(2 n) < -2 9 < x?_ f (2n) 

\ »=i 

( \ 

4(2 n) X?- f (2n) 

^ n — ^ — n 

-2 ^ In X,; -2 ^ In A, : 

\ i—l i=l / 

Hence, 100(1 — a)% confidence interval for 9 is given by 


x|(2n) Xi-§(2ri) 

n ’ n 

-2^ In Xi -2 ]T In X, 

i=1 i=l 

Here xf_a(2n) denotes the (l — f )-quantile of a chi-square random variable 
Y, that is 

P(V<X?_ f (2n)) = l-| 
and xl (2n) similarly denotes ^-quantile of Y, that is 

p(y<x |(2n)) = | 
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for a < 0.5 (see figure below). 



Example 17.12. If Xi, X 2 ,..., X„ is a random sample from a distribution 
with density function 


/M) = 



if 0 < x < 9 
otherwise, 


where 9 > 0 is a parameter, then what is the 100(1 — a)% confidence interval 
for 91 

Answer: The cumulation density function of f(x\ 9) is 


F(x; 9) 


X 

e 


0 


if 0 < x < 9 
otherwise. 


Since 


-2 2 ^ F(Xi-,0) 

i=l 


- 2 E ln 

i=1 


Xi 

9 


n 

= 2 n In 9 - 2^1nA, : 

i= 1 


by Theorem 17.5, the quantity 2n In 0 — 2 X)" =1 In X t ~ y 2 (2n). Since 
2?r In 9 — 2 ]T)" =1 In X, is a function of the sample and the parameter and 
its distribution is independent of 9 , it is a pivot for 9. Hence, we take 


n 

Q(X 1 ,X 2 ,..., X n , 9) = 2 n In 9 - 2 ^ In X t . 

2=1 
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The 100(1 — a)% confidence interval for 9 can be constructed from 
1 ~a = P (x|(2n) <Q< X ?_« (2n)) 

= p(x | (2 n) < 2?r In 9 - 2^ In X, < - (2n)^ 


i=l 


= P\X | (2n) + 2^ In X, < 2?r In 9 < X ?_ f (2n) + 2^ In X, 


= P e 


2=1 

*{x|(2n)+2^ =i lnX,} < ^ | Y ?_ f (2„)+2 ^ =i In X, |' 


Hence, 100(1 — a)% confidence interval for 9 is given by 

ik (x| (2n)+2^ In X, \ £ |x?_ a (2n)+2^ 111 X,; 


»=■ 1 


i=l 


The density function of the following example belongs to the scale family. 
However, one can use Theorem 17.5 to find a pivot for the parameter and 
determine the interval estimators for the parameter. 

Example 17.13. If X -\, X^-, .... X n is a random sample from a distribution 
with density function 


f(x] 0) = 


1 _ x 

e c 


0 


if 0 < x < oo 


otherwise, 


where 9 > 0 is a parameter, then what is the 100(1 — a)% confidence interval 
for 91 

Answer: The cumulative density function F(x m ,9) of the density function 

if 0 < x < oo 


f(x\ 0) = 


1 _ x 

~e e e 


is given by 


v 0 otherwise 

F(x; 0) = 1 — e~%. 


n 2 n 

-2£>(1 -F(X i ;9))=-Y l X i . 
2 = 1 ° 2—1 


Hence 
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Thus 


~ P( 2n )- 


i=1 


We take Q = § as the pivotal quantity. The 100(1 — a)% confidence 

i— 1 

interval for 6 can be constructed using 

1 ~ a = P (x|(2n) < Q < xLf(2n)) 

= p fx|(2n) < | £> < x?-$ (2n)'] 


( 


= P 


2 X v ' 

i=i < 0 < i=l 


X?_^( 2n ) xl(2n) 


V 


) 


Hence, 100(1 — a)% confidence interval for 6 is given by 

n n 

2 Y,Xi 2 Y, x i 


X?_f( 2 n)' x|(2n) 


In this section, we have seen that 100(1 — a)% confidence interval for the 
parameter 6 can be constructed by taking the pivotal quantity Q to be either 

n 

Q = -2 X] In F (Xi\9) 

1= 1 


or 

n 

Q = — 2 X^ In (1 — P (Xi\ 9)). 

»=l 

In either case, the distribution of Q is chi-squared with 2 n degrees of freedom, 
that is Q ~ % 2 (2?r). Since chi-squared distribution is not symmetric about 
the y-axis, the confidence intervals constructed in this section do not have 
the shortest length. In order to have a shortest confidence interval one has 
to solve the following minimization problem: 


Minimize L(a, b ) 

Subject to the condition 


[ 6 

/ f(u)du = 1 — a, 
J a 


(MP) 
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where 


f(x) = 


x 2 e 2 . 


r(^)2^ 

In the case of Example 17.13, the minimization process leads to the following 
system of nonlinear equations 


[ f(u ) du = 1 - 
J a 


In - = 


2 (n + 1) 


If a 0 and b a are solutions of (NE), then the shortest confidence interval for 9 
is given by 

'2Ek^, S' 

b„ 1 a„ 


17.6. Approximate Confidence Interval for Parameter with MLE 

In this section, we discuss how to construct an approximate (1 — a) 100% 
confidence interval for a population parameter 9 using its maximum likelihood 
estimator 9. Let Xi, X 2 , ■■■, X n be a random sample from a population X 
with density f{x\9). Let 9 be the maximum likelihood estimator of 9. If 
the sample size n is large, then using asymptotic property of the maximum 
likelihood estimator, we have 

6- E (d) 

= = ~ iV(0,1) as n —» oo, 

\M ? ) 

where Var denotes the variance of the estimator 9. Since, for large n, 
the maximum likelihood estimator of 9 is unbiased, we get 



The variance Var 


can be computed directly whenever possible or using 


the Cramer-Rao lower bound 
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Now using Q = — , 6 as the pivotal quantity, we construct an approxi- 

v'MS) 

mate (1 — a) 100% confidence interval for 9 as 


1 — a = P {—zql <Q<z f ) 


( 


= P 


—z<* 


2 


< 


V 



\ 

/ 


If Var (e) 

1 

Thus 100(1 


is free of 9 , then have 


-a = P | 6 — z*\jVar ffl) <9 <9 + z^JVar 


— a)% approximate confidence interval for 9 is 


Jvar (?), 9 + z« Jvar (fl) 



provided Var ^ is free of 9. 

Remark 17.5. In many situations Var ^9^j is not free of the parameter 9. 
In those situations we still use the above form of the confidence interval by 
replacing the parameter 9 by 9 in the expression of Var ^6^. 

Next, we give some examples to illustrate this method. 

Example 17.14. Let X -\, X - 2 ,..., X n be a random sample from a population 
X with probability density function 

( p x (1 — pp - ®) if x = 0,1 
f(x;p) = j 

^ 0 otherwise. 

What is a 100(1 — a)% approximate confidence interval for the parameter pi 
Answer: The likelihood function of the sample is given by 

n 

L(p) = ]Jp Xi (1-P) (1 - Xi) - 

i=1 
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Taking the logarithm of the likelihood function, we get 

n 

In L(p) = ^2 [xi In p + (1 - Xi) ln(l - p)\ . 
Differentiating, the above expression, we get 

- i> - ^ b l-*,). 

dp p 1 — p 

i=l i=l 

Setting this equals to zero and solving for p, we get 

nx n — nx 

= 0 , 


P 1 ~P 


that is 

(1 — p) nx = p(n— nx), 

which is 

nx — pnx = pn — pnx. 

Hence 

p = x. 

Therefore, the maximum likelihood estimator of p is given by 

p = X. 


The variance of X is 


Var ( X ) = 


Since X ~ Ber(p), the variance a 2 = p( 1 — p), and 

Var (p) = Var (X) = ~ p \ 

n 

Since Var ( p ) is not free of the parameter p, we replave p by p in the expression 
of Var (p) to get 

p{l-p) 


V ar (p) ~ 


n 


The 100(1 — a)% approximate confidence interval for the parameter p is given 
by 


P — Za. 


p{l~p) 


, P + z% 


p(l~p) 
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which is 


X-. 


The above confidence interval is a 100(1 — a)% approximate confidence 
interval for proportion. 

Example 17.15. A poll was taken of university students before a student 
election. Of 78 students contacted, 33 said they would vote for Mr. Smith. 
The population may be taken as 2200. Obtain 95% confidence limits for the 
proportion of voters in the population in favor of Mr. Smith. 

Answer: The sample proportion p is given by 

P= g=0.4231. 

Hence 

y'B+I = yf 0-4231) (0.5769[ = Q 0559 
The 2.5 th percentile of normal distribution is given by 


A). 025 = 1-96 (From table). 


Hence, the lower confidence limit of 95% confidence interval is 


p- z« 


p{ 1 - P) 


= 0.4231 - (1.96) (0.0559) 
= 0.4231 -0.1096 
= 0.3135. 


Similarly, the upper confidence limit of 95% confidence interval is 


P{ 1 - P) 


P+z% 

= 0.4231 + (1.96) (0.0559) 
= 0.4231 + 0.1096 
= 0.5327. 


Hence, the 95% confidence limits for the proportion of voters in the popula¬ 
tion in favor of Smith are 0.3135 and 0.5327. 
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Remark 17.6. In Example 17.15, the 95% percent approximate confidence 
interval for the parameter p was [0.3135,0.5327]. This confidence interval can 
be improved to a shorter interval by means of a quadratic inequality. Now 
we explain how the interval can be improved. First note that in Example 
17.14, which we are using for Example 17.15, the approximate value of the 
variance of the ML estimator p was obtained to be P< ' 1 ~ P ' > ■ However, this 
is the exact variance of p. Now the pivotal quantity Q = , p ^ p ^ becomes 

V Var(p) 


Q = 


p-p 

p(i-p) 


Using this pivotal quantity, we can construct a 95% confidence interval as 


0.05 = P — 2Q.025 < 


p(i-p) 


< ZQ.025 


= P 


P~P 


< 1.96 


/ p(i-p) 
n 

Using p = 0.4231 and n = 78, we solve the inequality 


p-p 


which is 


p(i-p) 


0.4231 -p 


p(i-p) 

78 


< 1.96 


< 1.96. 


Squaring both sides of the above inequality and simplifying, we get 
78 (0.4231 - p) 2 < (1.96) 2 (p - p 2 ). 

The last inequality is equivalent to 

13.96306158 - 69.84520000p + 81.84160000p 2 < 0. 


Solving this quadratic inequality, we obtain [0.3196, 0.5338] as a 95% confi¬ 
dence interval for p. This interval is an improvement since its length is 0.2142 
where as the length of the interval [0.3135, 0.5327] is 0.2192. 
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Example 17.16. If Xi, X 2 ,..., X n is a random sample from a population 
with density 

f Ox 6-1 if 0 < x < 1 

f{x\6) = { 


\ 0 otherwise, 

where 9 > 0 is an unknown parameter, what is a 100(1 — a)% approximate 
confidence interval for 9 if the sample size is large? 

Answer: The likelihood function L(9) of the sample is 


m = ip*? -1 - 

i =1 


Hence 

n 

In L{9) = n In 6 + (9 — 1) In Xj. 

i=l 

The first derivative of the logarithm of the likelihood function is 


d 

d9 


In L(9) 


^ + J2 lnx i- 


Setting this derivative to zero and solving for 9 , we obtain 


e = - 


EILi Inc¬ 


idence, the maximum likelihood estimator of 9 is given by 


e = - 


ELilnAV 


Finding the variance of this estimator is difficult. We compute its variance by 
computing the Cramer-Rao bound for this estimator. The second derivative 
of the logarithm of the likelihood function is given by 


Hence 


d 2 d (n 


d9 \ 9 


Xi 


9 2 ' 




n 
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Therefore 


Thus we take 


0 


Var 6) > 


n 


/ v n2 

Far («) = -. 

Since Var ^8^j has 9 in its expression, we replace the unknown 0 by its 
estimate 9 so that 

8 2 


Var 


0 


The 100(1 — a)% approximate confidence interval for 9 is given by 


which is 


L EIUM 


- 8 

~ 8 



°~ Z§ Vri’ 

9 + Z ^Vri 

> 


\ 

n 


y/n 


ElLi In X, J ’ E”=1 In X, n E"=i In X, 


Remark 17.7. In the next section 17.2, we derived the exact confidence 
interval for 9 when the population distribution in exponential. The exact 
100(1 — a)% confidence interval for 9 was given by 

X|(2 n) Xi_f ( 2n ) 

“sEEilnX,’ “2ElL 1 lnX,_ ' 

Note that this exact confidence interval is not the shortest confidence interval 
for the parameter 9. 

Example 17.17. If X\, X 2 , ...,X 4 g is a random sample from a population 
with density 

f 9x e ~ 1 if 0 < x < 1 

/ (x;9) = < 

[ 0 otherwise, 

where 9 > 0 is an unknown parameter, what are 90% approximate and exact 
confidence intervals for 8 if Ei=i InXj = —0.7567? 

Answer: We are given the followings: 

n = 49 
49 

]T In X,j = -0.7576 

i=l 

l-a = 0.90. 
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Hence, we get 


^0.05 — 1-64, 
49 


and 


E”=iM -0.7567 


7 


= -64.75 


= -9.25. 


E'U'nE -0.7567 
Hence, the approximate confidence interval is given by 


[64.75 - (1.64)(9.25), 64.75 + (1.64)(9.25)] 
that is [49.58, 79.92], 

Next, we compute the exact 90% confidence interval for 9 using the 
formula 

X|( 2 n) Xi-§( 2n ) 

"2Er = il n ^’ _ 2Er= 1 lnX i _ • 

From chi-square table, we get 

Xo.os(98) = 77.93 and Xo. 95 (98) = 124.34. 

Hence, the exact 90% confidence interval is 

124.34 
(2)(0.7567) 

a random sample from a population 

if x — 0, 1 , 2,..., oo 
otherwise, 

where 0 < 9 < 1 is an unknown parameter, what is a 100(1— a)% approximate 
confidence interval for 9 if the sample size is large? 

Answer: The logarithm of the likelihood function of the sample is 

n 

In L{9) = In 9 Xi + ?rln(l — 9). 

i= 1 


77.93 

(2)(0.7567) ’ 

that is [51.49, 82.16]. 

Example 17.18. If Xi, X 2 ,..., X n is 
with density 

( {1-9) 9 X 
f{x\0) = < 

0 
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Differentiating we see obtain 


d . £".,*< 


— lnL(0) = 


n 


e 


1-1 


Equating this derivative to zero and solving for 9, we get 9 = ^=. Thus, the 
maximum likelihood estimator of 9 is given by 

- X 


1 + X 


Next, we find the variance of this estimator using the Cramer-Rao lower 
bound. For this, we need the second derivative of In L(6). Hence 


nx 


— \nL(9) = - 


d9 2 


9 2 (1 — 9) 2 ' 


Therefore 

a P_ 

d9 2 


Therefore 


In L(9) 

) 


= E | 

( nX 

n \ 

y~¥~ ~ 

(1 ~9) 2 J 

n 

E(X)~ 

n 

= ¥ 

(1 ~9) 2 

n 

1 

n 

9 2 (1 - 9) 

(1-9) 2 


n 

rp ^ ' 


9(1-9) [9 l-0\ 

n (l — 9 + 9 2 ) 

9 2 (1 — 9) 2 ' 


(since each X, ~ GEO(l — 9)) 


Var 


0 


d 2 (i - 0J 

i(l-9 + 9 2 ^j 


1 — 9 + 9 2 

v j 

The 100(1 — a)% approximate confidence interval for 9 is given by 
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where 


X 


1 + X' 


17.7. The Statistical or General Method 

Now we briefly describe the statistical or general method for constructing 
a confidence interval. Let X- t . X 2 ,.... X n be a random sample from a pop¬ 
ulation with density f(x\9), where 9 is a unknown parameter. We want to 
determine an interval estimator for 9. Let T(X i,X 2 , X n ) be some statis¬ 
tics having the density function g(t; 9). Let p\ and p 2 be two fixed positive 
number in the open interval (0,1) with pi + p 2 < 1. Now we define two 
functions h\(9) and h 2 (9) as follows: 

rhi( 9 ) rh2(6) 

Pi = / g{t\ 9) dt and p 2 = g(t', 9) dt 

such that 


P(h 1 (0)<T(X 1 ,X 2 ,...,X n )<h 2 (9)) = l-p 1 -p 2 . 

If h\{9) and h 2 (9) are monotone functions in 9 , then we can find a confidence 
interval 

P (ui < 9 < u 2 ) = 1 - pi - p 2 

where u\ = u\{t) and u 2 = u 2 (t). The statistics T(Xi,X 2 , ...,X n ) may be a 
sufficient statistics, or a maximum likelihood estimator. If we minimize the 
length u 2 — U\ of the confidence interval, subject to the condition 1 — pi — p 2 = 
1 — a for 0 < a < 1, we obtain the shortest confidence interval based on the 
statistics T. 

17.8. Criteria for Evaluating Confidence Intervals 

In many situations, one can have more than one confidence intervals for 
the same parameter 9. Thus it necessary to have a set of criteria to decide 
whether a particular interval is better than the other intervals. Some well 
known criteria are: (1) Shortest Length and (2) Unbiasedness. Now we only 
briefly describe these criteria. 

The criterion of shortest length demands that a good 100(1 — a)% con¬ 
fidence interval [L, U] of a parameter 9 should have the shortest length 
l = U — L. In the pivotal quantity method one finds a pivot Q for a parameter 
9 and then converting the probability statement 


P(a < Q < b) = 1 — a 
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to 

P(L < 9 < U) = 1 - a 

obtains a 100(1— a)% confidence interval for 9. If the constants a and b can be 
found such that the difference U — L depending on the sample X -\, ,..., X n 

is minimum for every realization of the sample, then the random interval 
[L, U] is said to be the shortest confidence interval based on Q. 

If the pivotal quantity Q has certain type of density functions, then one 
can easily construct confidence interval of shortest length. The following 
result is important in this regard. 

Theorem 17.6. Let the density function of the pivot Q ~ h(q: 9) be continu¬ 
ous and unimodal. If in some interval [a, 6] the density function h has a mode, 
and satisfies conditions (i) / Q b h(q; 9)dq = 1 — a and (ii) h(a) = h(b) > 0, then 
the interval [a, b] is of the shortest length among all intervals that satisfy 
condition (i). 

If the density function is not unimodal, then minimization of £ is neces¬ 
sary to construct a shortest confidence interval. One of the weakness of this 
shortest length criterion is that in some cases, £ could be a random variable. 
Often, the expected length of the interval E(£) = E(U — L ) is also used 
as a criterion for evaluating the goodness of an interval. However, this too 
has weaknesses. A weakness of this criterion is that minimization of E(£) 
depends on the unknown true value of the parameter 9. If the sample size 
is very large, then every approximate confidence interval constructed using 
MLE method has minimum expected length. 

A confidence interval is only shortest based on a particular pivot Q. It is 
possible to find another pivot Q* which may yield even a shorter interval than 
the shortest interval found based on Q. The question naturally arises is how 
to find the pivot that gives the shortest confidence interval among all other 
pivots. It has been pointed out that a pivotal quantity Q which is a some 
function of the complete and sufficient statistics gives shortest confidence 
interval. 

Unbiasedness, is yet another criterion for judging the goodness of an 
interval estimator. The unbiasedness is defined as follow. A 100(1 — a)% 
confidence interval [L, U] of the parameter 9 is said to be unbiased if 

( > 1 — a if 9* = 9 
P(L < 9* < U) l 

{ < 1 - a if 9* 9. 
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17.9. Review Exercises 

1 . Let Xi,X 2 , ■■■, X n be a random sample from a population with gamma 
density function 

{ r ,l gi3 x 13 - 1 e~% for 0 < x < oo 
0 otherwise, 

where 9 is an unknown parameter and (3 > 0 is a known parameter. Show 
that 

' 2EIU*i 2E" =1 ^i ~ 

Xi -a (2n/3) ’ x|(2 11/3) 

is a 100(1 — a)% confidence interval for the parameter 9. 

2. Let X -[, X 2 -.... X n be a random sample from a population with Weibull 
density function 

f f for 0 < x < oo 

f(x-,0,P) = l 

\ 0 otherwise, 

where 9 is an unknown parameter and (3 > 0 is a known parameter. Show 
that 

~ 2 ELijff 

_X?_ f (2n)’ X |(2n) _ 

is a 100(1 — a)% confidence interval for the parameter 9. 

3. Let X-|, X‘ 2 .X n be a random sample from a population with Pareto 
density function 

( 9 f3 e x~^ e+1 ^ for (3 < x < oo 

f(x;9,(3)= l 

y 0 otherwise, 

where 9 is an unknown parameter and (3 > 0 is a known parameter. Show 
that 

~2 ElUln(f) 2Er=rln(f)' 

X?_ f (2n) ’ x|(2n) 

is a 100(1 — a)% confidence interval for |. 
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4. Let X \, X ‘2 -.... X n be a random sample from a population with Laplace 
density function 




— 00 < X < 00 


where 9 is an unknown parameter. Show that 


X?_s(2 n)’ x|(2n) 


is a 100(1 — a)% confidence interval for 0. 

5. Let .Xi, X 2 , ..4f,X n be a random sample from a population with density 
function 


f{x\0) = 


2^2 x 3 e 20 for 0 < x < 00 


otherwise, 


where 9 is an unknown parameter. Show that 


E n yl X^ n v2 

i=\Xj L~n=jXj 

X?_^( 4n )’ xl( 4n ) 


is a 100(1 — a) % confidence interval for 9. 

6 . Let X -[, X 2 ,..., X n be a random sample from a population with density 
function 

„ „ , A f P 9 (iS)°+i for 0 < 2 ; < 00 


f(x;d,(3) = 


otherwise, 


where 9 is an unknown parameter and (3 > 0 is a known parameter. Show 
that 

X|(2n) Xi-f ( 2n ) 

_2£r=iln(l + *f)’ 2£r =1 ln(l + Xf)_ 

is a 100(1 — a) % confidence interval for 9. 

7. Let X -[, X- 2 , ■■■■ X n be a random sample from a population with density 
function 

f e~^ x ~ 8 ' 1 if 9 < x < 00 


f(x\ 9) = 


otherwise, 
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where 9 £ M. is an unknown parameter. Then show that Q = X^) — 8 is a 
pivotal quantity. Using this pivotal quantity find a 100(1 — a)% confidence 
interval for 9. 


8. Let Xi, X 2 ,..., X n be a random sample from a population with density 
function 


f(x; 0) 


e ( x e ) if 9 < x < 00 
0 otherwise, 


where 9 € 1R is an unknown parameter. Then show that Q = 2n (X^ — 9 ) is 
a pivotal quantity. Using this pivotal quantity find a 100(1 — a)% confidence 
interval for 9. 


9 . Let X-|, X 2 ,.... X n be a random sample from a population with density 
function 


f(x; 9) 


e {x 9) if 9 < x < 00 
0 otherwise, 


where 9 GlR is an unknown parameter. Then show that Q = is a 

pivotal quantity. Using this pivotal quantity find a 100(1 — a)% confidence 
interval for 9. 


10 . Let Xi, X 2 , •••,X„ be a random sample from a population with uniform 
density function 


f{x\9) 


[ e if 0 < x < 8 

0 otherwise, 


where 0 < 8 is an unknown parameter. Then show that Q = is a pivotal 
quantity. Using this pivotal quantity find a 100(1 — a)% confidence interval 
for 9. 


11 . Let Xi,X 2 , ...,X„ be a random sample from a population with uniform 
density function 


f{x\8) 


if 0 < x < 8 
0 otherwise, 


where 0 < 9 is an unknown parameter. Then show that Q = A(1) is a 

pivotal quantity. Using this pivotal quantity find a 100(1 — a)% confidence 
interval for 9. 
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12 . If X \, X- 2 , X n is a random sample from a population with density 

if 9 < x < oo 


/(»;*) = 


2 e -i(x-e) 2 


10 otherwise, 

where 9 is an unknown parameter, what is a 100(1 — a)% approximate con¬ 
fidence interval for 9 if the sample size is large? 

13 . Let Xi,X 2 ,..., X n be a random sample of size n from a distribution with 
a probability density function 


( (9 + 1) x e 2 if 1 < a: < oo 
f(x', 9) = l 

[ 0 otherwise, 

where 0 < 9 is a parameter. What is a 100(1 — a)% approximate confidence 
interval for 9 if the sample size is large? 

14 . Let X -\, X- 2 ,..., X n be a random sample of size n from a distribution with 
a probability density function 


f(x;9) 


9 2 xe 6x if 0<a;<oo 
0 otherwise, 


where 0 < 9 is a parameter. What is a 100(1 — a)% approximate confidence 
interval for 9 if the sample size is large? 


15 . Let X\^X 2l ‘ 
function 


.., X n be a random sample from a distribution with density 

-(^-U 


f(x-, P) = 


e f> 


for x > 4 
otherwise, 


where /3 > 0. What is a 100(1 — a)% approximate confidence interval for 9 
if the sample size is large? 


16 . Let X -[, X 2 ,..., X n be a random sample from a distribution with density 
function 


l 

6 


for 0 < x <9 


f(x; 9) = 


[ 0 otherwise, 

where 0 < 9. What is a 100(1 — a)% approximate confidence interval for 9 if 
the sample size is large? 
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17 . A sample X -\, X 2 ,..., X n of size n is drawn from a gamma distribution 




x 3 e i 
6 / 3 1 


if 0 < x < oo 


0 otherwise. 


What is a 100(1 — a)% approximate confidence interval for 9 if the sample 
size is large? 

18 . Let Xi, X 2l ..., X n be a random sample from a continuous popu¬ 
lation X with a distribution function F{x\9). Show that the statistic 
Q = — 2 X^r= 1 In? 71 PQ; 0) is a pivotal quantity and has a chi-square dis¬ 
tribution with 2n degrees of freedom. 

19 . Let Xi,X 2 ,--.,X n be a random sample from a continuous popu¬ 
lation X with a distribution function F{x\9). Show that the statistic 
Q = — 2 Y^i=i In (1 — F (Xi; 9)) is a pivotal quantity and has a chi-square 
distribution with 2n degrees of freedom. 
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Chapter 18 

TEST OF STATISTICAL 
HYPOTHESES 
FOR 

PARAMETERS 


18.1. Introduction 

Inferential statistics consists of estimation and hypothesis testing. We 
have already discussed various methods of finding point and interval estima¬ 
tors of parameters. We have also examined the goodness of an estimator. 

Suppose X\, A' 2 ,..., X n is a random sample from a population with prob¬ 
ability density function given by 

((1 + 0 )** for 0 < x < 1 

/ (x',0) = < 

( 0 otherwise, 


where 9 > 0 is an unknown parameter. Further, let n = 4 and suppose 
x\ = 0.92, X 2 = 0.75, X 3 = 0.85, £4 = 0.8 is a set of random sample data 
from the above distribution. If we apply the maximum likelihood method, 
then we will find that the estimator 9 of 9 is 

n = 1 _ 1 _ 

ln(Ax) + ln(A 2 ) + ln(A 3 ) + ln(A 2 )' 

Hence, the maximum likelihood estimate of 9 is 
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Therefore, the corresponding probability density function of the population 
is given by 

f 5.2861a; 4 ' 2861 for 0 < x < 1 

f(x) = l 

y 0 otherwise. 

Since, the point estimate will rarely equal to the true value of 9, we would 
like to report a range of values with some degree of confidence. If we want 
to report an interval of values for 9 with a confidence level of 90%, then we 
need a 90% confidence interval for 9. If we use the pivotal quantity method, 
then we will find that the confidence interval for 9 is 

4( g ) , *t f (8) 



Since Xo.os( 8 ) = 2 -73, Xo. 9 s( 8 ) = 15 - 51 > and = -0.7567, we 

obtain 

2.73 15.51 

+ 2(0.7567)’ ” + 2(0.7567) 


which is 


[0.803, 9.249], 


Thus we may draw inference, at a 90% confidence level, that the population 
X has the distribution 


/ 0 ; o) = 


(1 + 9) x 9 for 0 < x < 1 
0 otherwise, 


where 9 £ [0.803,9.249]. If we think carefully, we will notice that we have 
made one assumption. The assumption is that the observable quantity X can 
be modeled by a density function as shown in (*). Since, we are concerned 
with the parametric statistics, our assumption is in fact about 9. 


Based on the sample data, we found that an interval estimate of 9 at a 
90% confidence level is [0.803,9.249]. But, we assumed that 9 £ [0.803,9.249]. 
However, we can not be sure that our assumption regarding the parameter is 
real and is not due to the chance in the random sampling process. The vali¬ 
dation of this assumption can be done by the hypothesis test. In this chapter, 
we discuss testing of statistical hypotheses. Most of the ideas regarding the 
hypothesis test came from Jerry Neyman and Karl Pearson during 1928-1938. 

Definition 18.1. A statistical hypothesis H is a conjecture about the dis¬ 
tribution f(x; 9) of a population X. This conjecture is usually about the 
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parameter 9 if one is dealing with a parametric statistics; otherwise it is 
about the form of the distribution of X. 

Definition 18.2. A hypothesis H is said to be a simple hypothesis if H 
completely specifies the density f(x;9 ) of the population; otherwise it is 
called a composite hypothesis. 

Definition 18.3. The hypothesis to be tested is called the null hypothesis. 
The negation of the null hypothesis is called the alternative hypothesis. The 
null and alternative hypotheses are denoted by H a and H a , respectively. 

If 8 denotes a population parameter, then the general format of the null 
hypothesis and alternative hypothesis is 

H 0 : 8 £ fi 0 and H a : 8 £ (*) 

where il Q and tt a are subsets of the parameter space with 

fl 0 n fl a = 0 and Q a U fi a C Q. 

Remark 18.1. If 12 0 U f2 0 = fh then (★) becomes 

H 0 : 9 £ Q 0 and H a : 9 ^ f l a . 

If fl 0 is a singleton set, then H a reduces to a simple hypothesis. For 
example, fl 0 = {4.2861}, the null hypothesis becomes H 0 : 9 = 4.2861 and the 
alternative hypothesis becomes H a : 9 ^ 4.2861. Hence, the null hypothesis 
H 0 : 6 = 4.2861 is a simple hypothesis and the alternative H a : 9 ^ 4.2861 is 
a composite hypothesis. 

Definition 18.4. A hypothesis test is an ordered sequence 

(Xi,X 2 ,..., X n ; H 0l H a \C) 

where X\. X 2 ,..., X n is a random sample from a population X with the prob¬ 
ability density function f(x;9 ), H a and H a are hypotheses concerning the 
parameter 9 in f(x\ 9 ), and C is a Borel set inlR”. 

Remark 18.2. Borel sets are defined using the notion of er-algebra. A 
collection of subsets A of a set S is called a a-algebra if (i) 5ed, (ii) A c £ A, 
whenever A £ A, and (iii) UfeLi Ak G A , whenever A 1 ; A 2 , ..., A n ,... £ A. The 
Borel sets are the member of the smallest cr-algebra containing all open sets 
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ofB”. Two examples of Borel sets inB" are the sets that arise by countable 
union of closed intervals inB", and countable intersection of open sets inB". 

The set C is called the critical region in the hypothesis test. The critical 
region is obtained using a test statistic W(X\,X 2 , ...,X n ). If the outcome of 
{Xi, X 2 ,..., X n ) turns out to be an element of C, then we decide to accept 
H a \ otherwise we accept H 0 . 

Broadly speaking, a hypothesis test is a rule that tells us for which sample 
values we should decide to accept H 0 as true and for which sample values we 
should decide to reject H a and accept H a as true. Typically, a hypothesis test 
is specified in terms of a test statistic W. For example, a test might specify 
that H a is to be rejected if the sample total J2k=i ^ k ^ ess than 8. In this 
case the critical region C is the set {{x\,x 2 ,..., x n ) \ x\ + x 2 H- \- x n < 8}. 

18.2. A Method of Finding Tests 

There are several methods to find test procedures and they are: (1) Like¬ 
lihood Ratio Tests, (2) Invariant Tests, (3) Bayesian Tests, and (4) Union- 
Intersection and Intersection-Union Tests. In this section, we only examine 
likelihood ratio tests. 

Definition 18.5. The likelihood ratio test statistic for testing the simple 
null hypothesis H 0 : 9 € against the composite alternative hypothesis 
H a : 9 ^ S! 0 based on a set of random sample data Xi, x 2 ,..., x n is defined as 


W(x 1 ,x 2 ,...,x n ) 


ma xL(9, x\,x 2 ,..., x n ) 

f2 0 

maxL(0,xi,x 2 , ... ,x „) ’ 


where fi denotes the parameter space, and L(6, x\, x 2 ,..., x n ) denotes the 
likelihood function of the random sample, that is 

n 

L{0, X1,X 2 , ..., X n ) = f(Xi\ 6). 

i= 1 


A likelihood ratio test (LRT) is any test that has a critical region C (that is, 
rejection region) of the form 


C = {{x\,x 2 , ...,x n ) | W(x 1 ,x 2 ,...,x n ) < k} , 


where k is a number in the unit interval [0,1]. 
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If H 0 : 9 = 9 0 and H a : 9 = 9 a are both simple hypotheses, then the 
likelihood ratio test statistic is defined as 


W(x !,X 2 , ... ,x n ) 


L(9 a ,x !,x 2 ,...,x n ) 
L(0a,Xi,X 2 ,-,X n )' 


Now we give some examples to illustrate this definition. 

Example 18.1. Let Xi, X 2l X 3 denote three independent observations from 
a distribution with density 


f (1 + 9) x 6 if 0 < x < 1 
f{x\0) = l 

0 otherwise. 

What is the form of the LRT critical region for testing H a : 9 = 1 versus 
Ha :9 = 21 

Answer: In this example, 9 0 = 1 and 9 a = 2. By the above definition, the 
form of the critical region is given by 


C = | (xi, x 2 ,x 3 ) Gl 3 
= j(£i,:E2,:r3) g1 3 

= j(£ 1 , 2 : 2 , £ 3 ) Gl 3 
= ( (xi,x 2 ,x 3 ) Gl 3 


L(9 0 ,x 1 ,x 2 ,x 3 ) 

L (9 a , xi,x 2 , x 3 ) 

(i+^) 3 nLi^“ 

82:12:22:3 
272:^22:3 — 

1 27 


< k 


< 


< k 
k 


XiX 2 X 3 


= {(x 1 ,x 2 ,x 3 ) Gl 3 I £ 12 : 22:3 > a, } 


< k 


where a is some constant. Hence the likelihood ratio test is of the form: 
3 

“Reject H 0 if J]X t > a.” 

i=l 

Example 18.2. Let X\, X 2 ,..., A 12 be a random sample from a normal 
population with mean zero and variance a 2 . What is the form of the LRT 
critical region for testing the null hypothesis H 0 : cr 2 = 10 versus H a : a 2 = 5? 


Answer: Here a 2 = 10 and a 2 = 5. By the above definition, the form of the 
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critical region is given by (with a 0 2 = 10 and a 2 = 5) 

L (<j 0 2 ,x 1 ,x 2 ,...,x 12 ) 


C = Uxi,x 2 , .:,x 12 ) el 
= | (xi,x 2 f■~,x 12 ) el 12 

= |(*1, X 2 f Xl2) G® 12 
= { (xi,x 2 , ..., X 12 ) G® 12 


L (a a 2 ,x 1 ,x 2 ,..., x 12 ) 
12 — e 


< k 




i= 1 


yj 27T(j2 


,-hm 


< k 


n ^TZ^<k 


2 1 e 


12 


E'G< 


where a is some constant. Hence the likelihood ratio test is of the form: 
12 

“Reject H 0 if < «•” 

i=l 

Example 18.3. Suppose that X is a random variable about which the 
hypothesis H a : X ~ {7iV/F(0,l) against H„ : X ~ 7V(0,1) is to be tested. 
What is the form of the LRT critical region based on one observation of X ? 

Answer: In this example, L 0 (x) = 1 and L a (x) = -^=e - s x2 . By the above 
definition, the form of the critical region is given by 


C= jxGl 

= |a: Gl 


L 0 (*£) 


< k 


= G J 
= {x G 3 


Lo, (x) 

\f 7 hre^ x < k | 

£ 21 “ (ti)} 

x < a, } 


where k G [0, 00 ) 


where a is some constant. Hence the likelihood ratio test is of the form: 
“Reject H n if X < a.” 

In the above three examples, we have dealt with the case when null as 
well as alternative were simple. If the null hypothesis is simple (for example, 
H 0 : 6 = d 0 ) and the alternative is a composite hypothesis (for example, 
H a : 9 ^ 0 o ), then the following algorithm can be used to construct the 
likelihood ratio critical region: 

(1) Find the likelihood function L(6,xi,x 2 , for the given sample. 
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(2) Find L(0 o , x\, x%, • x n ). 

(3) Find maxL(0,xi,X2, ~-,x n ). 

(4) Rewrite - H6o,x 1 ,x 2 .-O - j n a “suitable form”. 

v ' maxL(0,Xi,X2,...,x n ) 

(5) Use step (4) to construct the critical region. 

Now we give an example to illustrate these steps. 


Example 18.4. Let X be a single observation from a population with 
probability density 


[ for x = 0,1,2,..., oo 

/(ar;6>) = 

[ 0 otherwise, 

where 9 > 0. Find the likelihood ratio critical region for testing the null 
hypothesis H 0 : 9 = 2 against the composite alternative H a : 9 ^ 2. 

Answer: The likelihood function based on one observation x is 

9 X e~ d 
L(9,x) = ^—. 


Next, we find L(9 0 , x) which is given by 


L{ 2,x) 


2 X e~ 2 

x\ 


Our next step is to evaluate ma xL(9,x). For this we differentiate L(9,x) 
with respect to 9, and then set the derivative to 0 and solve for 9. Hence 


dL{9, x) 
d9 


e~ e x9 x ~ 1 — 9 X e~ e ' 


and = 0 gives 9 = x. Hence 


ma xL(9,x) = - - 

e>o x\ 


To do the step (4), we consider 


L{ 2,x) 


ma xL(9, x) 
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which simplifies to 

L(2, x) / 2e 

ma xL(0,x) \ x 

9en x 

Thus, the likelihood ratio critical region is given by 



C = { x G E 


2 c i _■ 

— e<k\={x& 


2e 

— < a 


where a is some constant. The likelihood ratio test is of the form: “Reject 

H 0 if (f f < a.” 


So far, we have learned how to find tests for testing the null hypothesis 
against the alternative hypothesis. However, we have not considered the 
goodness of these tests. In the next section, we consider various criteria for 
evaluating the goodness of a hypothesis test. 

18.3. Methods of Evaluating Tests 

There are several criteria to evaluate the goodness of a test procedure. 
Some well known criteria are: (1) Powerfulness, (2) Unbiasedness and Invari- 
ancy, and (3) Local Powerfulness. In order to examine some of these criteria, 
we need some terminologies such as error probabilities, power functions, type 
I error, and type II error. First, we develop these terminologies. 

A statistical hypothesis is a conjecture about the distribution f(x; 9) of 
the population X. This conjecture is usually about the parameter 9 if one 
is dealing with a parametric statistics; otherwise it is about the form of the 
distribution of X. If the hypothesis completely specifies the density f(x\ 9) 
of the population, then it is said to be a simple hypothesis; otherwise it is 
called a composite hypothesis. The hypothesis to be tested is called the null 
hypothesis. We often hope to reject the null hypothesis based on the sample 
information. The negation of the null hypothesis is called the alternative 
hypothesis. The null and alternative hypotheses are denoted by H a and H a , 
respectively. 

In hypothesis test, the basic problem is to decide, based on the sample 
information, whether the null hypothesis is true. There are four possible 
situations that determines our decision is correct or in error. These four 
situations are summarized below: 
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H 0 is true 

H 0 is false 

Accept H 0 

Correct Decision 

Type II Error 

Reject H 0 

Type I Error 

Correct Decision 


Definition 18.6. Let H a : 9 £ 0 o and H a : 9 qL fl 0 be the null and 
alternative hypotheses to be tested based on a random sample X ly X 2 ,..., X n 
from a population X with density f(x;9 ), where 9 is a parameter. The 
significance level of the hypothesis test 

H 0 : 9 £ S ! 0 and H a : 9 ^ f2 OJ 
denoted by a, is defined as 


a = P (Type I Error). 

Thus, the significance level of a hypothesis test we mean the probability of 
rejecting a true null hypothesis, that is 

a = P (Reject H 0 / H 0 is true). 

This is also equivalent to 

a = P (Accept H a / H 0 is true). 

Definition 18.7. Let H 0 : 9 £ and H a : 9 £ Cl a be the null and 
alternative hypothesis to be tested based on a random sample Xx,X 2 , ■■■, X n 
from a population X with density f(x;9 ), where 9 is a parameter. The 
probability of type II error of the hypothesis test 

H 0 : 9 £ n o and H a : 9 f2 0 , 

denoted by (3, is defined as 

/3 — P (Accept H 0 / H 0 is false). 

Similarly, this is also equivalent to 

/? = P (Accept H 0 / H a is true). 

Remark 18.3. Note that a can be numerically evaluated if the null hypoth¬ 
esis is a simple hypothesis and rejection rule is given. Similarly, [3 can be 
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evaluated if the alternative hypothesis is simple and rejection rule is known. 
If null and the alternatives are composite hypotheses, then a and (3 become 
functions of 9. 

Example 18.5. Let X -\, X'i ,..., X 20 be a random sample from a distribution 
with probability density function 

(p x {l-p) 1 ~ x if a; = 0,1 

f(x\p) = < 

[ 0 otherwise, 

where 0 < p < \ is a parameter. The hypothesis H a : p = \ to be tested 
against H a : p < If H 0 is rejected when J2i=i^i < 6 , then what is the 
probability of type I error? 

Answer: Since each observation A, ~ BER(p), the sum the observations 
20 

~ BIN(20,p). The probability of type I error is given by 

i=1 


a = P (Type I Error) 

= P (Reject H 0 / H 0 is true) 


20 


= P[ 1><6 


. i=1 
' 20 


H n is true 


1 


1 - 


1 


2 

/ 

20—fc 


= p E X ^ 6 / H °'-p=o 

\i =1 

=e( 2 ; 

k=0 v 

= 0.0577 


(from binomial table). 


Hence the probability of type I error is 0.0577. 

Example 18.6. Let p represent the proportion of defectives in a manufac¬ 
turing process. To test H a : p < \ versus H a : p > jj, a random sample of 
size 5 is taken from the process. If the number of defectives is 4 or more, the 
null hypothesis is rejected. What is the probability of rejecting H 0 if p = f ? 

Answer: Let X denote the number of defectives out of a random sample of 
size 5. Then A is a binomial random variable with n = 5 and p =1. Hence, 
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the probability of rejecting H 0 is given by 

a = P (Reject H 0 / H 0 is true) 
= P(X> 4 / H 0 is true) 

. . / 1 \ 


p =- 5/ 

) 

' 1 \ 

+p 



)ph 1 


= 51 \ 


[20 + 1 ] 


3125 


Hence the probability of rejecting the null hypothesis H 0 is 3 ^ 25 . 

Example 18.7. A random sample of size 4 is taken from a normal distri¬ 
bution with unknown mean /r and variance er 2 > 0. To test H 0 : fj, = 0 
against H a : fi < 0 the following test is used: “Reject H a if and only if 
X\ + X 2 + X 3 + X 4 < —20.” Find the value of a so that the significance level 
of this test will be closed to 0.14. 


Answer: Since 


0.14 


a (significance level) 

P (Type I Error) 

P (Reject H 0 / H 0 is true) 

P{Xi + X 2 + X 3 +X 4 < -20 / H 0 : /i = 0) 


P (.X < -5 /H 0 : n = 0) 




P[Z<- — 
a 


we get from the standard normal table 


1.08 = 


10 

a 
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Therefore 


10 

L08 


9.26. 


Hence, the standard deviation has to be 9.26 so that the significance level 
will be closed to 0.14. 


Example 18.8. A normal population has a standard deviation of 16. The 
critical region for testing H a : \x = 5 versus the alternative H a : fi = k is 
X > k — 2. What would be the value of the constant k and the sample size 
n which would allow the probability of Type I error to be 0.0228 and the 
probability of Type II error to be 0.1587. 

Answer: It is given that the population X ~ N (/i, 16 2 ). Since 


0.0228 = a 

= P (Type I Error) 

= P (Reject H 0 / H 0 is true) 
= P (X > k - 2 /H 0 : tx = 5) 



Hence, from standard normal table, we have 

(k - 7)y/n _ , 
16 “ 


which gives 


(fc - 7)Vn = 32. 
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Similarly 


0.1587 = P(Type II Error) 

= P (Accept H 0 / H a is true) 
= P (X < k - 2 /H a : n = k) 



Hence 0.1587 = 1 — P ^Z < or P ^Z < = 0.8413. Thus, from 

the standard normal table, we have 


which yields 


2 y/n _ . 
~ 16 ~ ~ 

n = 64. 


Letting this value of n in 

(k — 7)y/n = 32, 

we see that k = 11 . 

While deciding to accept H a or H a , we may make a wrong decision. The 
probability 7 of a wrong decision can be computed as follows: 

7 = P (H a accepted and H 0 is true) + P (H 0 accepted and H a is true) 

= P (H a accepted / H 0 is true) P (H 0 is true) 

+ P (H 0 accepted / H a is true) P (H a is true) 

= a P (H 0 is true) + (3 P (H a is true). 

In most cases, the probabilities P (H 0 is true) and P (H a is true) are not 
known. Therefore, it is, in general, not possible to determine the exact 
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numerical value of the probability 7 of making a wrong decision. However, 
since 7 is a weighted sum of a and (3, and P (H 0 is true) + P (H a is true) = 1, 
we have 

7 < max{a, /?}. 

A good decision rule (or a good test) is the one which yields the smallest 7 . 
In view of the above inequality, one will have a small 7 if the probability of 
type I error as well as probability of type II error are small. 

The alternative hypothesis is mostly a composite hypothesis. Thus, it 
is not possible to find a value for the probability of type II error, [3. For 
composite alternative, (3 is a function of 9. That is, j3 : Cl^ > [0,1]. Here Cl° 
denotes the complement of the set Cl a in the parameter space Cl. In hypothesis 
test, instead of f3, one usually considers the power of the test 1 — (3{9), and 
a small probability of type II error is equivalent to large power of the test. 

Definition 18.8. Let H a : 6 £ Cl a and H a : 6 Cl Q be the null and 
alternative hypothesis to be tested based on a random sample X\, X 2 ,..., X n 
from a population X with density /(x; 9), where 9 is a parameter. The power 
function of a hypothesis test 

H a : 9 G Cl 0 versus H„ : 9 Cl () 
is a function 7 r : Cl —> [0,1] defined by 

! P (Type I Error) if H 0 is true 

1 — P (Type II Error) if H a is true. 

Example 18.9. A manufacturing firm needs to test the null hypothesis H a 
that the probability p of a defective item is 0.1 or less, against the alternative 
hypothesis H a : p > 0.1. The procedure is to select two items at random. If 
both are defective, H a is rejected; otherwise, a third is selected. If the third 

item is defective H 0 is rejected. If all other cases, H a is accepted, what is the 

power of the test in terms of p (if H a is true)? 

Answer: Let p be the probability of a defective item. We want to calculate 
the power of the test at the null hypothesis. The power function of the test 
is given by 

if p < 0.1 


T P (Type I Error) 
n(p) = l 

[ 1 — P (Type II Error) 


if p > 0.1. 
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Hence, we have 
■k(p) 

= P (Reject H 0 / H 0 is true) 

= P (Reject H 0 / H 0 : p = p) 

= P (first two items are both defective /p) + 

+ P (at least one of the first two items is not defective and third is/p) 
= p 2 + (1 ~P) 2 P+ ^ p{l-p)p 

= p + p 2 -p 3 . 

The graph of this power function is shown below. 



rc(p) 


Remark 18.4. If X denotes the number of independent trials needed to 
obtain the first success, then X ~ GEO(p ), and 

P(X = k) = (l-p) k ~ 1 p, 
where k = 1,2,3,..., oo. Further 

P(X < n) = 1 — (1 — p) n 

n n 

£(i-p) fc - 1 P = pE(i-p) fc_1 

1 k =1 

i - (i -P) n 

p i-(i-p) 

= i - (i - P ) n . 


since 
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Example 18.10. Let X be the number of independent trails required to 
obtain a success where p is the probability of success on each trial. The 
hypothesis H a : p = 0.1 is to be tested against the alternative H a : p = 0.3. 
The hypothesis is rejected if X < 4. What is the power of the test if H a is 
true? 

Answer: The power function is given by 

( P (Type I Error) if p = 0.1 

n(p) = I 

[ 1 — P (Type II Error) if p = 0.3. 


Hence, we have 


a = 1 — P (Accept H 0 / H 0 is false) 

= P (Reject H 0 / H a is true) 

= P (X < 4 / H a is true) 

= P(X <4 / p = 0.3) 

4 

= Y J P{X = k /p = 0.3) 

k =1 
4 

= ^^(1 — p) k ~ l P (where p = 0.3) 
k =1 
4 

= ^(0.7) fe - 1 (0.3) 

k =1 

4 

= 0.3 ^(O .?)^ 1 

k =1 

= 1 - (0.7 ) 4 
= 0.7599. 

Hence, the power of the test at the alternative is 0.7599. 

Example 18.11. Let X 2 ,..., X 25 be a random sample of size 25 drawn 
from a normal distribution with unknown mean p and variance a 2 = 100 . 
It is desired to test the null hypothesis p = 4 against the alternative p = 6. 
What is the power at p = 6 of the test with rejection rule: reject p = 4 if 

Ei=i Xi > 125? 
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Answer: The power of the test at the alternative is 

7t(6) = 1 — P (Type II Error) 

= 1 — P (Accept H 0 / H 0 is false) 
= P (Reject H 0 / H a is true) 

/ 25 \ 


= PlJ2 x i>125 / H a : ^ = 6 

\i=l J 

= p (x > 5 / H at i = 6) 


= P 


X — 6 5 — 6 

> 


ip — jjl 

V 25 s/25 


= P [ Z >-2 

= 0.6915. 


Example 18.12. A urn contains 7 balls, 9 of which are red. A sample of 
size 2 is drawn without replacement to test H a : 9 < 1 against H a : 9 > 1. 
If the null hypothesis is rejected if one or more red balls are drawn, find the 
power of the test when 9 = 2. 

Answer: The power of the test at 9 = 2 is given by 

7t(2) = 1 — P (Type II Error) 

= 1 — P (Accept H 0 / H 0 is false) 

= 1 — P (zero red balls are drawn /2 balls were red) 



_ 11 
~ 21 
= 0.524. 


In all of these examples, we have seen that if the rule for rejection of the 
null hypothesis H a is given, then one can compute the significance level or 
power function of the hypothesis test. The rejection rule is given in terms 
of a statistic W (X\, X 2 ,..., X n ) of the sample X 1 ,X 2 , ...,X„. For instance, 
in Example 18.5, the rejection rule was: “Reject the null hypothesis H a if 
< 6.” Similarly, in Example 18.7, the rejection rule was: “Reject H a 
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if and only if X\ + X 2 + X 3 + X 4 < —20”, and so on. The statistic W, used in 
the statement of the rejection rule, partitioned the set S n into two subsets, 
where S denotes the support of the density function of the population X. 
One subset is called the rejection or critical region and other subset is called 
the acceptance region. The rejection rule is obtained in such a way that the 
probability of the type I error is as small as possible and the power of the 
test at the alternative is as large as possible. 

Next, we give two definitions that will lead us to the definition of uni¬ 
formly most powerful test. 

Definition 18.9. Given 0 < S < 1, a test (or test procedure) T for testing 
the null hypothesis H 0 : 9 £ Q 0 against the alternative H a : 9 £ Q a is said to 
be a test of level S if 

max7r(f?) < 5, 
ffeQa 

where n(6) denotes the power function of the test T. 

Definition 18.10. Given 0 < S < 1, a test (or test procedure) for testing 
the null hypothesis H 0 : 9 £ Q 0 against the alternative H a : 9 £ Q a is said to 
be a test of size 6 if 

max7r(f?) = 5. 
n 0 

Definition 18.11. Let T be a test procedure for testing the null hypothesis 
H a : 9 € 0 o against the alternative H a : 9 £ fi a . The test (or test procedure) 
T is said to be the uniformly most powerful (UMP) test of level S if T is of 
level S and for any other test W of level (5, 

M 0 ) > 

for all 9 £ fl a . Here 7 t t (9) and 7 t w (9) denote the power functions of tests T 
and W, respectively. 

Remark 18.5. If T is a test procedure for testing H 0 : 9 = 9 0 against 
H a : 9 = 9 a based on a sample data x\, ... ,x n from a population X with a 
continuous probability density function f(x; 9), then there is a critical region 
C associated with the the test procedure T, and power function of T can be 
computed as 



7 tt = 


5 ••*) *^n) dX\ • • • dx n 
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Similarly, the size of a critical region C, say a, can be given by 


a= L(9 0 ,xi,...,x n )dxi 
Jc 


■ dx „ 


The following famous result tells us which tests are uniformly most pow¬ 
erful if the null hypothesis and the alternative hypothesis are both simple. 

Theorem 18.1 (Neyman-Pearson). Let X\, X 2 ,X n be a random sam¬ 
ple from a population with probability density function f(x; 6). Let 


L(9,xi ,..., x n ) = Y[f(xi-,0) 
»=1 


be the likelihood function of the sample. Then any critical region C of the 
form 



L (J) o , X \,..., x n ) ^ ^ 

L a 1 X\ , ..., X n ) 


for some constant 0 < k < 00 is best (or uniformly most powerful) of its size 
for testing H 0 : 9 = 0 O against H a : 9 = 9 a - 


Proof: We assume that the population has a continuous probability density 
function. If the population has a discrete distribution, the proof can be 
appropriately modified by replacing integration by summation. 

Let C be the critical region of size a as described in the statement of the 
theorem. Let B be any other critical region of size a. We want to show that 
the power of C is greater than or equal to that of B. In view of Remark 18.5, 
we would like to show that 


/ L(6 a ,x 1 ,...,x n )dx 1 ---dx n > / L(6 a ,xi,...,x n )dxi- 
Jc Jb 


■ dx n 


(1) 


Since C and B are both critical regions of size a, we have 


L(0 o ,xi,...,x n )dx 1 ---dx n = / L(0 o ,x 1 ,...,x n )dx 1 ---dx n . (2) 


The last equality (2) can be written as 


/ L(9 0 ,xi, ...,x n ) dxi ■ ■ ■ dx n + / L(9o,xi,...,x n )dxi---dXn 

JcnB Jcr\B<= 

= / L(0 O , xi, ..., x n ) dx\ ■ ■ ■ dx n + / L(0 O , xi, ..., x n ) dx\ ■ ■ ■ dx n 

JcnB Jc c nB 
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c = {C n B) U [C n B c ) and B = {C n B) U (C c n B). (3) 

Therefore from the last equality, we have 

/ L(0 o ,x lt ...,x n ) dx! ■ ■ ■ dx n = / L(0 o ,x 1 ,...,x n )dxi---dx n . (4) 

JcnB° Jc c nB 


we have 


on C, and 


C = Ux 1 ,X 2 ,;^X n ) 

Xl, X n ) ^ 

•••; Xn) 


L (Oof X\ ,.. 

•5 *^n) 

Z/ (0 a , ,.. 

•j *En) 

L{0 O ’ I X\ ,. 

••5 *^n) 

k 


L(9 0 ,x lt . 

••5 *^n) 


on C c . Therefore from (4), (6) and (7), we have 

I Xn) dx i * * * dx n 

JcnBc 

f L(0 o ,x i,...,x n ) 


> / v dx 1 ---dx n 

JcnBc k 

f L(0 o ,X i,...,Xn) 

= / T - dx 1 • • • dx n 

Jc=r\B k 

> / L(6 a ,x 1 ,...,x n )dx 1 ---dx n . 

. Jc c nB 

Thus, we obtain 

/ L(9 a ,x 1 ,...,x n )dxi---dx n > / L(0 a ,x 1 ,...,x n )dx 1 ---dx n . 

Jcr\B c Jc c nB 

From (3) and the last inequality, we see that 
I Xn) dx i • • • dx n 

Jc 

/ L(9 a , x~i ., Xn) dx i • • • dx n T / L(Q a, x i,..., Xn) dx i • • • dx n 

JcnB JcnB c 

^ / L(Qm X\ , dx\ • • • dx n T / dj(J)x i, dx\ • • • dx n 

JcnB Jc c nB 

^ / L(0„, *£i, *^n) dxi • • • dx n 

Jb 

and hence the theorem is proved. 
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Now we give several examples to illustrate the use of this theorem. 

Example 18.13. Let X be a random variable with a density function f(x). 
What is the critical region for the best test of 

if — 1 < x < 1 
elsewhere, 

if — 1 < x < 1 
elsewhere, 

at the significance size a = 0 . 10 ? 


H 0 : f(x) = 


against 


H a : f(x) = 


1 - |x| 
0 


Answer: We assume that the test is performed with a sample of size 1. 
Using Neyman-Pearson Theorem, the best critical region for the best test at 
the significance size a is given by 

L 0 ( x ) 


Since 


C= {x € 


La (*£) 


< k 


= |£ GJ 
= el 
= lx el 


1 - £ 


< k 


0.1 = P{C) 

L 0 (X) 
L a (X) 


= P 

= P 


1 - \x\ 


N ^ 1 -2 k 


Jr - 1 < x < 1 - 
2 k 2 k 


<k/H a is true 


<k/H a is true 


= P[±--l<X<l-±- / H 0 i s true 


1 - 


2k \ 


dx 


1 2 k' 

we get the critical region C to be 


C= {x Gl I -0.1 < £ < 0.1}. 
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Thus the best critical region is C = [—0.1, 0.1] and the best test is: “Reject 
H a if-0.1 < A < 0.1”. 

Example 18.14. Suppose X has the density function 

f (1 + 9) x s if 0 < x < 1 
f{x\0) = l 

0 otherwise. 


Based on a single observed value of X, find the most powerful critical region 
of size a = 0.1 for testing H a : 9 = 1 against H a : 9 = 2. 


Answer: 

given by 


By Neyman-Pearson Theorem, the form of the critical region is 

L(9 0 ,x) 


C= ie 


= < X € . 


L(9 a ,x) 

(1 + do) X 6 ' 


< k 


(1 + 0 a )x 6a 


e„ - 


< k 


= < X G . 


2x 

3x 2 


< k 


= < x elR, | — < ^-k\ 

\ x 2 J 

= {x € 1 | x > a, } 


where a is some constant. Hence the most powerful or best test is of the 
form: “Reject H 0 if X > a.” 


Since, the significance level of the test is given to be a = 0.1, the constant 
a can be determined. Now we proceed to find a. Since 


hence 


0.1 = a 

= P (Reject H a / H a is true} 
= P(X>a / 9= 1) 

= / 2 xdx 


= 1 -a z 


a 2 = 1 - 0.1 = 0.9. 


a = \/0.9, 


Therefore 
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since k in Neyman-Pearson Theorem is positive. Hence, the most powerful 
test is given by “Reject H a if X > \/0T9”. 

Example 18.15. Suppose that X is a random variable about which the 
hypothesis H a : X ~ UNIF(0,1) against H a : X ~ 7V(0,1) is to be tested. 
What is the most powerful test with a significance level a = 0.05 based on 
one observation of XI 


Answer: By Neyman-Pearson Theorem, the form of the critical region is 
given by 

C={x €R I 44 <k 


La (*^) 

|x | \/27r ei x < k\ 


= <, x e 
= Gl | x 2 < 2In 
= {x €l | x < a, } 




where a is some constant. Hence the most powerful or best test is of the 
form: “Reject H a if X < a.” 


Since, the significance level of the test is given to be a = 0.05, the 
constant a can be determined. Now we proceed to find a. Since 


0.05 = a 

= P (Reject H 0 / H a is true} 

= P (X < a / X ~ UNIF( 0,1)) 



= a, 

hence a = 0.05. Thus, the most powerful critical region is given by 


C={xeE I 0 < a: < 0.05} 

based on the support of the uniform distribution on the open interval (0,1). 
Since the support of this uniform distribution is the interval (0,1), the ac¬ 
ceptance region (or the complement of C in (0,1)) is 


C c = {x el | 0.05 < x < 1}. 
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However, since the support of the standard normal distribution is Jt, the 
actual critical region should be the complement of C c in 1R. Therefore, the 
critical region of this hypothesis test is the set 

{x £ 1R | x < 0.05 or x > 1}. 


The most powerful test for a = 0.05 is: “Reject H a if X < 0.05 or X > 1.” 

Example 18.16. Let X -[, X 2 , X 3 denote three independent observations 
from a distribution with density 

f (1 + 9) x B if 0 < x < 1 
{0 otherwise. 

What is the form of the best critical region of size 0.034 for testing H 0 : 9 = 1 
versus H a : 6 = 2? 

Answer: By Neyman-Pearson Theorem, the form of the critical region is 
given by (with 9 0 = 1 and 9 a = 2) 


C = j(£ 1 , 2 : 2 ,z 3 ) gR 

= { (xi,X2,X 3 ) el 3 


3 1 L(9q, x 1 ,x 2 ,x 3 ) 
L(9 a ,xi,x 2 , 2 : 3 ) 

(i+^o ) 3 nli xf 


< k > 

< k 


K (i+«-) 3 nli*f“ ~ 

= |( 2 : 1 , 2 : 2 ,* 3 ) eE 3 | < k } 

= {(*!, 2 : 2 , x 3 ) eE 3 


1 


27, 
< —k 


X 1 X 2 X 3 8 

= {(xr,x 2 , 2 : 3 ) el 3 | 2:12:22:3 > a , } 


where a is some constant. Hence the most powerful or best test is of the 
3 

form: “Reject H 0 if 

i= 1 

Since, the significance level of the test is given to be a = 0.034, the 
constant a can be determined. To evaluate the constant a, we need the 
probability distribution of X 1 X 2 X 3 . The distribution of X 1 X 2 X 3 is not 
easy to get. Hence, we will use Theorem 17.5. There, we have shown that 
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—2(1 + 9) i lnJQ ~ X 2 (6)- Now we proceed to find a. Since 
0.034 = a 

= P (Reject H a / H a is true} 

= P (XiX 2 X 3 >0/9=1) 

= P (ln(XiX 2 X 3 ) > lira / 6 = 1) 

= P (-2(1 + 6 ) ln(XiX 2 X 3 ) < -2(1 + 9) In a / 9 = 1) 
= P(-41n(X 1 X 2 X 3 ) < -4 In a) 

= P (x 2 (6) < —4 In a) 
hence from chi-square table, we get 

—4 In a = 1.4. 


Therefore 

a = e -0 ' 35 = 0.7047. 

Hence, the most powerful test is given by “Reject H a if XiX 2 X 3 > 0.7047”. 

The critical region C is the region above the surface XiX 2 x 3 = 0.7047 of 
the unit cube [0, l] 3 . The following figure illustrates this region. 



Critical region is to the right of the shaded surface 


Example 18.17. Let X\, X 2 ,..., X 32 be a random sample from a normal 
population with mean zero and variance a 2 . What is the most powerful test 
of size 0.025 for testing the null hypothesis H a : a 2 = 10 versus H a : a 2 = 5? 
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Answer: By Neyman-Pearson Theorem, the form of the critical region is 
given by (with a 0 2 = 10 and a 2 = 5) 


C = < {xi,x 2 , xi 2 ) el 


,12 


L (<7 0 2 , Xi,X 2 , ..., 0 : 12 ) 
L (cr a 2 ,X!,X 2 , x 12 ) 


< k 


— \ (Xi, 3^2, •••, ^ 12 ) 

(xi,X 2 , :.,Xi 2 ) Si 12 

(xi, x 2 , 2:12) € R 12 


12 


J’J yjl* 

yj e l'K(j\ 




i= 1 




< k 


2 ' e2 ° 






< & 


where a is some constant. Hence the most powerful or best test is of the 
12 

form: “Reject H a if ^Ex 2 < a.” 

i=l 

Since, the significance level of the test is given to be a = 0.025, the 
constant a can be determined. To evaluate the constant a, we need the 
probability distribution of X 2 + X\ + • • • + X 2 2 . It can be shown that the 
distribution of J2]=i (yr) ~ X 2 (12). Now we proceed to find a. Since 


0.025 = a 

= P (Reject H a / H a is true} 



hence from chi-square table, we get 



Therefore 


a = 44. 
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Hence, the most powerful test is given by “Reject H a if yb =1 Xf < 44.” The 
best critical region of size 0.025 is given by 

r 12 

C = < (xi,X 2 i■■■,xr 2 ) Gl 12 | — 44 

l »=l 


In last five examples, we have found the most powerful tests and corre¬ 
sponding critical regions when the both H a and H a are simple hypotheses. If 
either H 0 or H a is not simple, then it is not always possible to find the most 
powerful test and corresponding critical region. In this situation, hypothesis 
test is found by using the likelihood ratio. A test obtained by using likelihood 
ratio is called the likelihood ratio test and the corresponding critical region is 
called the likelihood ratio critical region. 

18.4. Some Examples of Likelihood Ratio Tests 


In this section, we illustrate, using likelihood ratio, how one can construct 
hypothesis test when one of the hypotheses is not simple. As pointed out 
earlier, the test we will construct using the likelihood ratio is not the most 
powerful test. However, such a test has all the desirable properties of a 
hypothesis test. To construct the test one has to follow a sequence of steps. 
These steps are outlined below: 


(1) 

( 2 ) 

( 3 ) 

( 4 ) 

( 5 ) 

( 6 ) 

( 7 ) 

( 8 ) 
(9) 


Find the likelihood function L(0,xi,x 2 ,..., x n ) for the given sample. 
Evaluate ma xL(0, x\,x 2 , x n ). 

Find the maximum likelihood estimator 0 of 9. 

Compute maxL(9,Xi,x 2 , ■~,x n ) using L [6,xi,x 2 , ... ,x n J. 

ma xL{6, x\,x 2 ,..., x n ) 

Using steps (2) and (4), find W(x lt .... *„) = l^^x^x,, ...,x n ) ' 

Using step (5) determine C = {(xi,x 2 , ■■■,x n ) \ W(x\, ...,x n ) < k}, 
where k £ [0,1]. 

Reduce W(x i,... ,x n ) < k to an equivalent inequality W(x i, ...,x n ) < A. 
Determine the distribution of W{x i, ...,x n ). 

Find A such that given a equals P (w{x \,..., x n ) < A | H a is truej . 
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In the remaining examples, for notational simplicity, we will denote the 
likelihood function L(9,xi,X 2 , —,x n ) simply as L(0). 

Example 18.19. Let Xi, X 2 , ■■■, X n be a random sample from a normal 
population with mean /i and known variance a 2 . What is the likelihood 
ratio test of size a for testing the null hypothesis H 0 : fi = /j, a versus the 
alternative hypothesis H a : /r 7 ^ /x Q ? 

Answer: The likelihood function of the sample is given by 


n 




i=l 


Since Q 0 = {^, 0 }, we obtain 


max L(n) = 


t^2tt 


9o) 


t= 1 


We have seen in Example 15.13 that if X ~ N(^i,a 2 ), then the maximum 
likelihood estimator of fj, is X, that is 


fi = X. 


Hence 


max L(/i) = L(fl) = 


n 





Now the likelihood ratio statistics W(x 1 , 2 : 2 , •~,x n ) is given by 


W(x i,x 2 , •••, x n ) 


n 

i= 1 


n 
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which simplifies to 

W{xi,x 2 , ...,x n ) = e“^ (a:_Alo) . 

Now the inequality W(xi,x 2 , ■ ■■, x n ) < k becomes 

g — < £ 

and which can be rewritten as 

2 a 2 

(X - Ho) > -ln(fc) 

n 

or 

I* - Mo| > K 

where K = ln(fc). In view of the above inequality, the critical region 

can be described as 


C = {(x 1 ,x 2 ,...,x n ) | |x — Mol > }• 


Since we are given the size of the critical region to be a, we can determine 
the constant K. Since the size of the critical region is a, we have 


a = P (|X — Mo| > K). 


For finding K , we need the probability density function of the statistic X — /j TJ 
when the population X is N(h,ct 2 ) and the null hypothesis : m = Mo is 
true. Since a 2 is known and X t ~ N(n,a 2 ), 


and 



N( 0,1) 


a = P(\X-n 0 \ >K) 



= P[\Z\> K.X— 


where 


Z = 


X Ho 


a 
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we get 


which is 


Z9L 


2 



a 


K = 


ZSL 


2 


a 


\Jn 


where z s. is a real number such that the integral of the standard normal 
density from to oo equals f. 

Hence, the likelihood ratio test is given by “Reject H 0 if 

\X-n 0 \ >z~-^.” 

1 1 2 Jn 


If we denote 


z = 

then the above inequality becomes 


x - Ho 


a 



\Z\>z?- 


Thus critical region is given by 


C = {(x 1 ,x 2 ,...,x n ) | |z| > }. 


This tells us that the null hypothesis must be rejected when the absolute 
value of 2 takes on a value greater than or equal to . 



Remark 18.6. The hypothesis H a : fj, ^ fi 0 is called a two-sided alternative 
hypothesis. An alternative hypothesis of the form H a : fj, > fx Q is called 
a right-sided alternative. Similarly, H a : /j, < /j, 0 is called the a left-sided 
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alternative. In the above example, if we had a right-sided alternative, that 
is H a : /r > fj, Q , then the critical region would have been 

C = {(x i,a:2,...,x„) | z>z a }. 

Similarly, if the alternative would have been left-sided, that is H a : ^ < /i 0 , 
then the critical region would have been 

C = {(x 1 ,X2,—,X n ) | Z<-Z a }. 

We summarize the three cases of hypotheses test of the mean (of the normal 
population with known variance) in the following table. 


H 0 

H a 

Critical Region (or Test) 

M = Mo 

M > Mo 

* = 

va 

M = Mo 

M < Mo 

S-Mo < - 

cr __ ^q; 

M = Mo 

M 7 ^ Mo 

N= ^ >*§ 

Vn. 


Example 18.20. Let Xi, X 2 ,X n be a random sample from a normal 
population with mean /i and unknown variance <r 2 . What is the likelihood 
ratio test of size a for testing the null hypothesis H n : /j = /i Q versus the 
alternative hypothesis H a : 

Answer: In this example, 

SI = {(/r,<r 2 ) €l 2 | — oo < fi < oo, a 2 > 0} , 
n o = {(MoW 2 ) 6® 2 | cr 2 > 0} , 
n a = { (/x, <7 2 ) € R 2 I M 7^ Mo, CT 2 > 0} . 


These sets are illustrated below. 
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0 

i 

£ 

^0 


£2 



_► 11 

[to 

Graphs of Q and Q 0 

-► [X 


The likelihood function is given by 


L 


e -#; Er = i ( a! ‘-**) 2 


\/27rcr 2 


Next, we find the maximum of L (/qcr 2 ) on the set fi 0 . Since the set tt 0 is 
equal to {(/i Q , cr 2 ) € E 2 | 0 < cr < oo}, we have 


max L (ll, a 2 ) = max L (u G , a 2 

((i,u 2 )6f! 0 V ’ (T 2 >0 V 


Since L (/i D , cr 2 ) and In L (/z 0 , cr 2 ) achieve the maximum at the same cr value, 
we determine the value of a where In L (/i Q , cr 2 ) achieves the maximum. Tak¬ 
ing the natural logarithm of the likelihood function, we get 

1 U 

In (L (n,cr 2 )) = -|ln(cr 2 ) - | ln(27r) - — - /z 0 ) 2 . 


Differentiating InL (yU 0 ,cr 2 ) with respect to cr 2 , we get from the last equality 

hl ( L ^ = ^2 + ^4 X> - Mo) 2 - 

i=l 

Setting this derivative to zero and solving for cr, we obtain 


* -X>-A*°) 2 - 


cr = 
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Thus In (L (/i, cr 2 )) attains maximum at a = 


\ 


— T(xi — Ho) 2 - Since this 
n ' 


i=l 


value of a is also yield maximum value of L (ji, cr 2 ), we have 


max L (u Q , 

<t 2 >0 v 



Next, we determine the maximum of L (fi, cr 2 ) on the set Q. As before, 
we consider In i(/i,cr 2 ) to determine where L(/i,cr 2 ) achieves maximum. 
Taking the natural logarithm of L (/r,cr 2 ), we obtain 


1 n 

In (L (/r, cr 2 )) = -|ln(o- 2 ) - | ln(27r) - ^ -/r) 2 . 

" ” 2=1 


Taking the partial derivatives of In L (/z, cr 2 ) first with respect to /j, and then 
with respect to cr 2 , we get 

ElnL( M ,a 2 ) = 

^ 2=1 

and 

^2 ( M . = -^2 + ^4 E (^ - ^ 

2=1 

respectively. Setting these partial derivatives to zero and solving for /i and 
a, we obtain 

_ , 2 1 2 

(i = x and a =- 5 , 

n 

n 

where s 2 = — x) 2 is the sample variance. 

i-1 

Letting these optimal values of fj, and a into L (//, cr 2 ), we obtain 


Hence 


max L (ll, a 2 

0,<7 2 )<Eft 



max L(u,a 2 
(n,a 2 )en o 


max L (ll, a 2 
(M.<r 2 )en v 


27r nE( Xi “ Mo) 2 ] e 


e 2 


f X> 



^ E ^ - x ) 2 
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y'jxi - x) 2 = (n - 1) s 2 


y^jxi - m) 2 = ^(a;* - X) 2 +n(x- /z G ) 2 , 


we get 


max L(u,a 2 ) / - 

uU x (/i,<j 2 )en 0 n [x- Ho) 

W{X l,X 2 ,...,X n ) = - -7 - 2 V = U+ -T- 2 - 

max L (u, cr“) y n — 1 

Now the inequality W(a;i,a: 2 , < k becomes 


n (x - HoY 
n — 1 s 2 


and which can be rewritten as 


x~H 0 \ . n -1 


fc“n -1 


be - Mo 


where J\ = y (n — 1) fc » — 1 . In view of the above inequality, the critical 
region can be described as 


C = { (X!,X 2 , —,X n ) 


\X - Mo 


and the best likelihood ratio test is: “Reject H a if x J* 0 > K”. Since we 

vs 

are given the size of the critical region to be a, we can find the constant K. 
For finding K , we need the probability density function of the statistic 

y/n 

when the population X is N(n,cr 2 ) and the null hypothesis H a : m = is 
true. 

Since the population is normal with mean /i and variance er 2 , 


X Mo . / n \ 
—g - t(n — 1), 
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n 

where S 2 is the sample variance and equals to -X) . Hence 

2=1 


K = t~(n- 1) 



where ta(n — 1) is a real number such that the integral of the t-distribution 
with n — 1 degrees of freedom from fa(n — 1) to oo equals §. 

Therefore, the likelihood ratio test is given by “Reject H a : p, = if 

vn 


If we denote 


x - Ho 

S 

y/n 


then the above inequality becomes 


|T|>t f (n-l). 


Thus critical region is given by 

C= {(x 1 ,x 2 ,...,x n ) I \t\ >tf(n-l)}. 

This tells us that the null hypothesis must be rejected when the absolute 
value of t takes on a value greater than or equal to ta (n — 1). 



Remark 18.7. In the above example, if we had a right-sided alternative, 
that is H a : n > /j, 0 , then the critical region would have been 

C = {(aq, x 2 ,..., x n )\ t> t a (n - 1)}. 
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Similarly, if the alternative would have been left-sided, that is H a : fj, < /j, 0 , 
then the critical region would have been 

C = {(x 1 ,x 2 , ! t<-t a (n- 1)}. 

We summarize the three cases of hypotheses test of the mean (of the normal 
population with unknown variance) in the following table. 


H 0 

H a 

Critical Region (or Test) 

M = Mo 

M > Mo 

t=*-J i °>t a (n 1) 

Vs 

M = Mo 

M < Mo 

t= x-Ho< ta ( n 

VS 

M = Mo 

M 7 ^ Mo 

1*1= ^ >**(n-l) 

vs 2 


Example 18.21. Let Xi, X 2 , ..., X n be a random sample from a normal 
population with mean /i and variance a 2 . What is the likelihood ratio test 
of significance of size a for testing the null hypothesis H 0 : a 2 = a 2 versus 
H a :a 2 ± cr 2 ? 

Answer: In this example, 

Q = {(m, a 2 ) €l 2 j — 00 < fi < 00, a 2 > 0 } , 

Cl Q = gl 2 | -oo</i<oo}, 

Cl a = { (m, cr 2 ) G Jt 2 | — 00 < H < 00, a ^ a 0 } . 

These sets are illustrated below. 


0 

i 



a 0 


D 0 

a 


► I I 

0 

Graphs of Q and Q 0 

► p 
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The likelihood function is given by 


L 


g 2cr 2 Z_>i=l v 




7T(J 


Next, we find the maximum of L (ax,ct 2 ) on the set Sl 0 . Since the set is 
equal to {(/./,, <r 2 ) G® 2 | — oo < fi < oo}, we have 


max L(u.<j 2 

(/i,<r2)en 0 v 


max L (/i, a 2 ) . 

oo</x<oo v 7 


Since L (/i, <r 2 ) and In T (/i, ct 2 ) achieve the maximum at the same /i value, we 
determine the value of /x where In L (/x,er 2 ) achieves the maximum. Taking 
the natural logarithm of the likelihood function, we get 

ln(i (/x,cr 2 )) = 7 ^ hr((J 2 ) - | ln(27r) - —2 ^’ 

0 i=l 

Differentiating InT (ax, cr 2 ) with respect to /x, we get from the last equality 
— ln(T ( M , a 2 )) = ^^(^-/x). 

Setting this derivative to zero and solving for /x, we obtain 


/x = x. 


Hence, we obtain 


max T (ax, <t 2 ) 

—00</i<00 v 7 




E”=i( Xi - x ) 


2 


Next, we determine the maximum of L (/x, cr 2 ) on the set f2. As before, 
we consider In L (ax, a 2 ) to determine where L(ax,ct 2 ) achieves maximum. 
Taking the natural logarithm of L (/x,cr 2 ), we obtain 

1 n 

In (L y^ 2 )) = ~nln(a) - | ln(27r) - ^ ^(aJi - m) 2 . 
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Taking the partial derivatives of In L (ji, a 2 ) first with respect to /i and then 
with respect to er 2 , we get 

^7 hi L^a 2 ) = 


A hlL („, a 2 ) = ~ + 2^4 X> - M) 2 , 

2=1 

respectively. Setting these partial derivatives to zero and solving for and 
cr, we obtain 

_ n 2 1 2 

fi — x and cr = - s , 

n 

n 

where s 2 = — a :) 2 is the sample variance. 

i -1 

Letting these optimal values of /a and cr into L (/a, cr 2 ), we obtain 


r / 2 \ f n \ 

max L [u,a ) = -— 7 -e 

( M ,a 2 )efi VP ’ ' V 27r ( n - 1 ) s V 




Therefore 


W(x i,x 2 ,x n ) = 


max L (u, a 2 
(/i,<r 2 )en 0 v 

max L(u,a 2 


^ 2-7r<T 2 ^ 


27r(n—l)s 2 


-(#* — #) 2 

g 2cr 2 Z-^i=l V ' 


n /(n-l)s 2 \ 2 

= n 2 e 2 f-- y ~—J e 2,T ° 


Now the inequality W{x i,x 2 , —,x n ) < k becomes 


it / (n — l)s 2 \ 2 - 1 2 )s2 
n 2 e 2 I -2-) e 2<J ° < A; 


which is equivalent to 


(n — l)s 2 


< m - := K, 
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where K a is a constant. Let H be a function defined by 

H(w) = w n e~ w . 

Using this, we see that the above inequality becomes 


H 


(n — l)s^ 


< K„. 


The figure below illustrates this inequality. 



From this it follows that 
(n — l)s 2 


< A'i or 


(n — l)s 2 


> Ko. 


In view of these inequalities, the critical region can be described as 
C = \ (x 1 ,x 2 ,...,x n ) 


(n — l)s 2 [n — l)s 2 

-- 0 < K x or ^-./ > R 2 


and the best likelihood ratio test is: “Reject H a if 

Since we are given the size of the critical region to be a, we can determine the 
constants K i and K 2 . As the sample X \, X 2 ,..., X n is taken from a normal 
distribution with mean /x and variance <r 2 , we get 


(n — ljS 12 


X 2 (n- 1 ) 
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when the null hypothesis H a : cr 1 2 = a 2 is true. 

Therefore, the likelihood ratio critical region C becomes 


(x 1 ,x 2 ,...,x n ) 


(n — l)f 


< (n — 1) or 


(n — l)s 2 


> 1 ) 


and the likelihood ratio test is: “Reject H 0 : a 2 = a 2 if 


(n-l)5 2 


< x|(n - 1) 


or 


{n-l)S 2 


>X?-*(r 


1 )” 


where x° ( n — 1) is a real number such that the integral of the chi-square 
density function with (n — 1) degrees of freedom from 0 to X“ ( n — 1) is 
Further, Xi-° (n — 1) denotes the real number such that the integral of the 
chi-square density function with (n — 1) degrees of freedom from \ 2 _ (n — 1) 
to oo is §. 

Remark 18.8. We summarize the three cases of hypotheses test of the 
variance (of the normal population with unknown mean) in the following 
table. 


H 0 

Ha 

Critical Region (or Test) 

a 2 = a 2 

a 2 > cr 2 

X 2 = ( J o > Xl- a (n 1) 

a 2 = a 2 

a 2 < cr 2 

X 2 = ' J o < X 2 in 1) 

a 2 = a 2 

± 

X 2 = ( J o > X?_ a / 2 ( n !) 

or 

x 2 = ( J o < xl/ 2 ( n !) 


18.5. Review Exercises 

1. Five trials Xi,X 2 ,..., X 5 of a Bernoulli experiment were conducted to test 
H a : p = \ against H a : p = |. The null hypothesis H a will be rejected if 
y^ =1 Xi = 5. Find the probability of Type I and Type II errors. 

2. A manufacturer of car batteries claims that the life of his batteries is 
normally distributed with a standard deviation equal to 0.9 year. If a random 
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sample of 10 of these batteries has a standard deviation of 1.2 years, do you 
think that a > 0.9 year? Use a 0.05 level of significance. 

3. Let Xi, X 2 ,..., Xg be a random sample of size 8 from a Poisson distribution 
with parameter A. Reject the null hypothesis H a : A = 0.5 if the observed 
sum y? =1 Xi > 8. First, compute the significance level a of the test. Second, 
find the power function /3(A) of the test as a sum of Poisson probabilities 
when H a is true. 

4. Suppose X has the density function 

f l for 0 < x <6 
{0 otherwise. 

If one observation of X is taken, what are the probabilities of Type I and 
Type II errors in testing the null hypothesis H a : 9 = 1 against the alternative 
hypothesis H a : 9 = 2, if H 0 is rejected for X > 0.92. 

5. Let X have the density function 

( (9 + 1) x e for 0 < x < 1 where 9 > 0 
f{x\9) = l 

0 otherwise. 

The hypothesis H 0 : 9 = 1 is to be rejected in favor of Hi : 9 = 2 if X > 0.90. 
What is the probability of Type I error? 

6. Let X -\, X‘ 2 , ...,X 6 be a random sample from a distribution with density 
function 

( 9 x 0 - 1 for 0 < x < 1 where 9 > 0 
f(x; 9) = l 

^ 0 otherwise. 

The null hypothesis H a : 9 = 1 is to be rejected in favor of the alternative 
H a -.9 > 1 if and only if at least 5 of the sample observations are larger than 
0.7. What is the significance level of the test? 

7. A researcher wants to test H a : 9 = 0 versus H a : 9 = 1, where 9 is a 
parameter of a population of interest. The statistic W, based on a random 
sample of the population, is used to test the hypothesis. Suppose that under 
H a , W has a normal distribution with mean 0 and variance 1, and under H a , 
W has a normal distribution with mean 4 and variance 1. If H 0 is rejected 
when W > 1.50, then what are the probabilities of a Type I or Type II error 
respectively? 
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8 . Let Xi and X 2 be a random sample of size 2 from a normal distribution 
N(g, 1). Find the likelihood ratio critical region of size 0.005 for testing the 
null hypothesis H a : g = 0 against the composite alternative H a : g ^ 0? 

9. Let Xi, X 2 , ■ ..., X-[ 0 be a random sample from a Poisson distribution with 
mean 9. What is the most powerful (or best) critical region of size 0.08 for 
testing the null hypothesis H 0 : 9 = 0.1 against H a : 9 = 0.5? 

10 . Let X be a random sample of size 1 from a distribution with probability 
density function 

f (1 — t) + 0 * if 0 < 2: < 1 

/M) = j 

^ 0 otherwise. 

For a significance level a = 0.1, what is the best (or uniformly most powerful) 
critical region for testing the null hypothesis H 0 : 9 = — 1 against H a : 9 = 1? 


11. Let X \, X 2 be a random sample of size 2 from a distribution with prob¬ 
ability density function 


f(x\ 0 ) 


6 X f; if * = 0,1,2,3,.... 

0 otherwise, 


where 9 > 0. For a significance level a = 0.053, what is the best critical 
region for testing the null hypothesis H a : 9 = 1 against H a : 9 = 2? Sketch 
the graph of the best critical region. 


12 . Let Xl, X 2 ,..., Xg be a random sample of size 8 from a distribution with 
probability density function 


f{x\ 0 ) 


if * = 0,1,2,3,.... 
0 otherwise, 


where 9 > 0. What is the likelihood ratio critical region for testing the null 
hypothesis H a : 9 = 1 against H a : 9 1? If a = 0.1 can you determine the 

best likelihood ratio critical region? 


13. Let Xi, X' 2 ,..., X n be a random sample of size n from a distribution with 
probability density function 


f{x\f 3 ) 


r(7)/3 7 > 


if * > 0 


0 


otherwise, 



Probability and Mathematical Statistics 


583 


where /3 > 0. What is the likelihood ratio critical region for testing the null 
hypothesis H 0 : (3 = 5 against H a : (3 ^ 5? What is the most powerful test? 

14. Let Xi, X 2 ,..., X$ denote a random sample of size 5 from a population 
X with probability density function 

((1 - ey - 1 e if x = 1 , 2 , 3 ,. ..,00 
fix; 9) = l 

[ 0 otherwise, 


where 0 < 9 < 1 is a parameter. What is the likelihood ratio critical region 
of size 0.05 for testing H a : 9 = 0.5 versus H a : 9 ^ 0.5? 

15. Let X\,X 2 ,X 3 denote a random sample of size 3 from a population X 
with probability density function 


fix;p) 



_ (a=-<0 2 

e 2 


— oo < x < oo, 


where — oo < p < oo is a parameter. What is the likelihood ratio critical 
region of size 0.05 for testing H a : p = 3 versus H a : p ^ 3? 

16. Let Xi, X 2 , X 3 denote a random sample of size 3 from a population X 
with probability density function 


fix; 9) 


he 


0 


if 0 < x < 00 
otherwise, 


where 0 < 9 < 00 is a parameter. What is the likelihood ratio critical region 
for testing H a : 9 = 3 versus H a : 9 ^ 3? 

17. Let X\ 1 X 2 ,X 3 denote a random sample of size 3 from a population X 
with probability density function 


fix; 9) 


if x = 0,1,2,3 ,...,00 
0 otherwise, 


where 0 < 9 < 00 is a parameter. What is the likelihood ratio critical region 
for testing H 0 : 9 = 0.1 versus H a : 9 ^ 0.1? 

18. A box contains 4 marbles, 9 of which are white and the rest are black. 
A sample of size 2 is drawn to test H 0 : 9 = 2 versus H a : 9 ^ 2. If the null 
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hypothesis is rejected if both marbles are the same color, find the significance 
level of the test. 


19. Let Xi, X 2 , X 3 denote a random sample of size 3 from a population X 
with probability density function 


f(x\6) = 



for 0 < x < 9 
otherwise, 


where 0 < 9 < oo is a parameter. What is the likelihood ratio critical region 
of size pjg for testing H 0 : 9 = 5 versus H a : 9 ^ 5? 

20. Let X \, X 2 and X 3 denote three independent observations from a dis¬ 
tribution with density 




—e 3 


0 


for 0 < x < oo 
otherwise, 


where 0 < (3 < oo is a parameter. What is the best (or uniformly most 
powerful critical region for testing H a : (i = 5 versus H a : (3 = 10? 

21. Suppose X has the density function 


f{x; 0) 


\ for 0 < x < 9 
0 otherwise. 


If X 2 , X 3 , X 4 is a random sample of size 4 taken from X, what are the 
probabilities of Type I and Type II errors in testing the null hypothesis 
H a : 9 = 1 against the alternative hypothesis H a : 9 = 2, if H a is rejected for 
max{Xi,X 2 ,X 3 , X 4 } < 

22. Let Xi, X 2 ,X 3 denote a random sample of size 3 from a population X 
with probability density function 


f(x; 9) 


d e 1 


0 


if 0 < x < 00 
otherwise, 


where 0 < 9 < 00 is a parameter. The null hypothesis H 0 : 9 = 3 is to be 
rejected in favor of the alternative H a \ 9 ^ 3 if and only if X > 6.296. What 
is the significance level of the test? 
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Chapter 19 


SIMPLE LINEAR 
REGRESSION 
AND 

CORRELATION ANALYSIS 


Let X and 
function f(x,y). 


Y be two random variables with joint probability density 
Then the conditional density of Y given that X = x is 


f(y/x) 


f{x,y ) 

9{x) 


where 


/ OO 

f{x,y)dy 

-oo 

is the marginal density of X. The conditional mean of Y 

/ OO 

yf{y/x)dy 

-oo 


is called the regression equation of Y on X. 

Example 19.1. Let X and Y be two random variables with the joint prob¬ 
ability density function 

f xe~ x ( 1+y ) if x > 0, y > 0 
f(x, y) = < 

0 otherwise. 


Find the regression equation of Y on X and then sketch the regression curve. 



Simple Linear Regression and Correlation Analysis 


586 


Answer: The marginal density of A' is given by 


9 (x) = 


xe 


-x(l+y) 


dy 


L 


—oo 
oo 


xe e 


= xe 


= xe 


= e 


— oo 

—x 


L 


~ x e -xy dy 
~ xy dy 


— OO 

1 

— e 
x 


-| oo 


-xy 


J 0 


The conditional density of Y given X = x is 

The conditional mean of Y given X = x is given by 


/ oo roo 

yf(y/x)dy= / y 

-oo J —OO 


y > o. 


xe 


~ xy dy = —. 


Thus the regression equation of Y on X is 

E(Y/x) = ~, x>0. 

X 

The graph of this equation of Y on X is shown below. 
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From this example it is clear that the conditional mean E(Y/x ) is a 
function of x. If this function is of the form a + fix, then the correspond¬ 
ing regression equation is called a linear regression equation; otherwise it is 
called a nonlinear regression equation. The term linear regression refers to 
a specification that is linear in the parameters. Thus E(Y/x ) = a + fix 2 is 
also a linear regression equation. The regression equation E(Y/x) = ax 13 is 
an example of a nonlinear regression equation. 

The main purpose of regression analysis is to predict Y t from the knowl¬ 
edge of Xi using the relationship like 


E{Yi/xi) = a + fixi. 


The Y t is called the response or dependent variable where as Xj is called the 
predictor or independent variable. The term regression has an interesting his¬ 
tory, dating back to Francis Galton (1822-1911). Galton studied the heights 
of fathers and sons, in which he observed a regression (a “turning back”) 
from the heights of sons to the heights of their fathers. That is tall fathers 
tend to have tall sons and short fathers tend to have short sons. However, 
he also found that very tall fathers tend to have shorter sons and very short 
fathers tend to have taller sons. Galton called this phenomenon regression 
towards the mean. 

In regression analysis, that is when investigating the relationship be¬ 
tween a predictor and response variable, there are two steps to the analysis. 
The first step is totally data oriented. This step is always performed. The 
second step is the statistical one, in which we draw conclusions about the 
(population) regression equation E(Y i /x i ). Normally the regression equa¬ 
tion contains several parameters. There are two well known methods for 
finding the estimates of the parameters of the regression equation. These 
two methods are: (1) The least square method and (2) the normal regression 
method. 

19.1. The Least Squares Method 

Let {( Xi, yi) | i = 1,2,..., n} be a set of data. Assume that 

E(Yi/xi) = a + fix^ (1) 


that is 


Ui = a + fixu 


i = 1,2,..., n. 
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Then the sum of the squares of the error is given by 

n 

=^2(yi-a-fai) 2 . ( 2 ) 

i= 1 


The least squares estimates of a and (3 are defined to be those values which 
minimize £{u,(3). That is, 


(a, 3) 


= arg min £(a, /?). 

0 ,/ 3 ) 


This least squares method is due to Adrien M. Legendre (1752-1833). Note 
that the least squares method also works even if the regression equation is 
nonlinear (that is, not of the form (1)). 

Next, we give several examples to illustrate the method of least squares. 

Example 19.2. Given the five pairs of points ( x,y ) shown in table below 


X 

4 

0 

—2 

3 

1 

y 

5 

0 

0 

6 

3 


what is the line of the form y = x + b best fits the data by method of least 
squares? 

Answer: Suppose the best fit line is y = x + b. Then for each Xj, x» + b is 
the estimated value of y,. The difference between y t and the estimated value 
of y-j is the error or the residual corresponding to the i th measurement. That 
is, the error corresponding to the z th measurement is given by 


G = Vi ~ Xi - b. 


Hence the sum of the squares of the errors is 

m = '£t 2 i 
2 = 1 
5 

= ^2(yi- x i~ b ) 2 ■ 

2=1 

Differentiating £(b) with respect to 6, we get 

= 2 ^(yi-Xi-b) (- 1 ). 
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Setting -^£(b) equal to 0, we get 

5 

Y (v i -Xi~b) = 0 

i= 1 


which is 


5 5 

5 b=Y yi ~ Y x i- 

i=l i=l 


Using the data, we see that 

56 = 14 - 6 


which yields b = |. Hence the best fitted line is 



Example 19.3. Suppose the line y = bx + 1 is fit by the method of least 
squares to the 3 data points 


X 

l 

2 

4 

y 

2 

2 

0 


What is the value of the constant bl 

Answer: The error corresponding to the i th measurement is given by 

e-i = y% ~ bxi - 1. 

Hence the sum of the squares of the errors is 

£(b) = Y £ i 

i—1 

3 

= Y ( Vi ~ bXi ~ 1 ) 2 • 

i=l 

Differentiating £(b) with respect to 6, we get 
, 3 

~K £ (b) = 2 YAVi- bXi ~ 1 ) 
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Setting 4f.£{b) equal to 0, we get 


^ (Vi ~ bxi - 1 ) Xi = 0 


which in turn yields 


b = 


^Xiyi - ^T.1 
i =1 i =1 


i=1 


Using the given data we see that 


b = 


6-7 1 


21 21 ’ 


and the best fitted line is 


» = ~n x+1 ' 


Example 19.4. Observations y\, t/ 2 , •••> Vn are assumed to come from a model 
with 

E(Yi/xi) = 0 + 21n Xi 

where 0 is an unknown parameter and X\ , X2, are given constants. What 
is the least square estimate of the parameter 01 

Answer: The sum of the squares of errors is 

n n 

£{ 6 ) = J 2 e * = J 2( yi - 9 - 21 n U :) 2 • 

i= 1 i=l 

Differentiating £(9) with respect to 0 , we get 

i n 

-£(0) = 2j2(y i -0-2lnx i ) (-1). 


Setting jg£{0) equal to 0, we get 


y~l (yi - o - 2 In Xi) = 0 

i= 1 



n 




which is 
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n 

Hence the least squares estimate of 9 is 9 = y — 2 In ay. 

i= 1 

Example 19.5. Given the three pairs of points (x,y) shown below: 


X 

4 

l 

2 

y 

2 

l 

0 


What is the curve of the form y = x^ best fits the data by method of least 
squares? 

Answer: The sum of the squares of the errors is given by 

n 

£(/?) = 

i= 1 

n 9 

2—1 

Differentiating £(/?) with respect to /?, we get 

7 n 

{P) = ‘ 1 Y1 ( Vi x i ln 

Setting this derivative ^£((3) to 0, we get 

n n 

ViXi lna; i = ^Z X i X i lna T 
2 — 1 2=1 

Using the given data we obtain 

(2) 4^ ln 4 = 4 2/3 ln 4 + 2 2/3 ln 2 


which simplifies to 

4 = (2) 4^ + 1 


or 



Taking the natural logarithm of both sides of the above expression, we get 


p = 


In 3 — ln 2 
ln4 


0.2925 




Simple Linear Regression and Correlation Analysis 


592 


Thus the least squares best fit model is y = x 02925 . 

Example 19.6. Observations yi, y 2 , ■■■, Dn are assumed to come from a model 
with E(Yi/xi) = a + (3xi, where a and (3 are unknown parameters, and 
X\,X 2 ,x n are given constants. What are the least squares estimate of the 
parameters a and (31 

Answer: The sum of the squares of the errors is given by 

n 

£(a,P) = 

i =1 
n 

= P x i) 2 ■ 

i= 1 

Differentiating with respect to a and /? respectively, we get 

f) n 

—£(a,(3) = 2 ^(yi-a- fat) (- 1 ) 


and 

d n 

—£{a,(3) = 2 - a ~ P x i) 

Setting these partial derivatives J^£(a,0) and J^£(a,f3) to 0, we get 

n 

^2 (Vi - a - (3xi) = 0 (3) 

i=l 


and 


From (3), we obtain 


which is 


n 

Y. (yi - a - (3Xi) Xi = 0. 

i =1 


n n 

y Vi = na + (3 y Xi 

i =1 i=l 


y = a + (3 x. 


Similarly, from (4), we have 


n n n 

y x^i = a y Xi + 13 y x 2 

i= 1 i—1 i —1 


( 4 ) 


( 5 ) 
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which can be rewritten as follows 

n n 

- x)(yi - y) + nxy = nax + 0y~](xj - x){xi - x) + n/3x 2 (6) 

i=l i =1 

Defining 

Sxy 

we see that (6) reduces to 

5 X!/ + nx y = anx + (3 [S xx + nx 2 ] (7) 

Substituting (5) into (7), we have 

S xy + nicy = [y - /3x] nx + /3 [S xx + nx 2 ] . 

Simplifying the last equation, we get 


n 

■= ^2(xi - x)(yi - y) 
i=1 


S X y — ft S x 


which is 


In view of (8) and (5), we get 


P 


S X y 

~Q 

^xx 


_ u xy _ 

a = y- —±x. 

i^XX 


Thus the least squares estimates of a and (3 are 

q ^ Q 

— — ^xy — l rt xy 

a = y~—?-x and 0=—^-, 

dXX ^XX 


( 8 ) 

(9) 


respectively. 

We need some notations. The random variable Y given X = x will be 
denoted by Y x . Note that this is the variable appears in the model E(Y/x) = 
a + (3x. When one chooses in succession values aq, x %,..., x n for x, a sequence 
Y Xl ,Y X2 , ...,Y Xn of random variable is obtained. For the sake of convenience, 
we denote the random variables Y Xl ,Y X2 , ...,Y Xn simply as Yi, Y 2j ..., Y n . To 
do some statistical analysis, we make following three assumptions: 

(1) E{Y X ) = a + f3x so that /X* = E(Yi) = a + /3 xf, 
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(2) Y 1 ,Y 2 ,...,Y n are independent; 

(3) Each of the random variables Y \, Y 2 , •••, Y n has the same variance er 2 . 

Theorem 19.1. Under the above three assumptions, the least squares esti¬ 
mators a and 8 of a linear model E(Y/x) = a + /3x are unbiased. 


Proof: From the previous example, we know that the least squares estima¬ 
tors of a and 8 are 


where 


a = Y - ^ X and p= 

XX dXX 


S xY ■= E^* — x)(Yi — Y). 

i=1 


First, we show f3 is unbiased. Consider 


(?) 


E18)=E 


= B = J_ 


E ( S x y ) 


\ $XX / S x 

1 ( n _ N 
= c~ E 


1 " 

= q~ E^® —x)E (Yj — Y) 

2=1 

1 n 1 n 

= q~ E^ - E (Yi) - — E^ —x)E (7) 

&XX i i^xx -i 

2=1 1=1 

1 n 1 n 

= E(** - *) E w) -*-e(y) eou - x ) 


2=1 


2=1 


^ n i n 

= -g— E( x * -x)E (Yi) = — ^2(xi -x) (a + (3x.i) 


i =1 


i =1 


^ IL ^ II 

= a — E( Xi ~ x ) + P -q — E( Xi “ 


2=1 


2=1 


= P ~q~ E( Xi “ X ) 


2=1 


^ n i n 

= 8 -q— E( a: ‘ -x) Xi- 8 — E(*i “ S ) 

i-J xx P^xx 


= 8 -q~ E^^ “ 2: ) (*< “ X ) 


= 8 ~s~ S xx = 8- 

^XX 
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Thus the estimator fi is unbiased estimator of the parameter (3. 

Next, we show that a is also an unbiased estimator of a. Consider 

E(a) = E (y - ^x) = E (Y) E (^) 

= E (F) - xE (fi) =E(Y)-x (3 

= i 

= — ( T E (a + fix A ] — x fi 

n \U ) 

= — \ na + fi y Xi — x fi 

n v U ) 

= a + fix — x fi = a 

This proves that a is an unbiased estimator of a and the proof of the theorem 
is now complete. 

19.2. The Normal Regression Analysis 


In a regression analysis, we assume that the xfis are constants while yfi s 
are values of the random variables Yfi s. A regression analysis is called a 
normal regression analysis if the conditional density of Yi given Xi = Xi is of 
the form 


f(Vi/xi ) = 


72 


TUT 


where a 2 denotes the variance, and a and fi are the regression coefficients. 
That is Y\ Xi ~ N(a + fix,a 2 ). If there is no danger of confusion, then we 
will write Y t for Y\ Xi . The figure on the next page shows the regression 
model of Y with equal variances, and with means falling on the straight line 
H v = a + fi x. 


Normal regression analysis concerns with the estimation of a , a, and 
fi. We use maximum likelihood method to estimate these parameters. The 
maximum likelihood function of the sample is given by 


L(a, a, fi) = n fiVi/xi) 

i=1 
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and 



n 

hiL(a,a, (3) = y^ln fjyi/xj) 

i=1 



fl' 

= —nlncr — — ln(27r) — 

l n 

2 = 1 




Taking the partial derivatives of In L(a, a, (3) with respect to a, j3 and a 
respectively, we get 

d i n 

— In L(a, a, (3) = ^ (y* - a - (3 Xi) 

i =1 

d i n 

— In L(a,a,/3) = (y t -a-(3xi)xi 

^ i =1 

d 7i 1 n 

— In L(a, a, (3) = -b -3 ^ ( Vi ~ a - (3xj) 2 . 

aa a a 6 ' 

%=i 

Equating each of these partial derivatives to zero and solving the system of 
three equations, we obtain the maximum likelihood estimator of /?, a, a as 
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where 

n 

s xY = *) ( K * - F ) • 

i=i 

Theorem 19.2. In the normal regression analysis, the likelihood estimators 
(3 and a are unbiased estimators of (3 and a, respectively. 


Proof: Recall that 


P = 


SxY 

~Q 

^xx 



i=i 



where S xx = ” =1 (x,; — x) 2 . Thus (3 is a linear combination of Yj’s. Since 

Yi ~ N (a + /?Xj, cr 2 ), we see that (3 is also a normal random variable. 


First we show /3 is an unbiased estimator of (3. Since 



= e (^ W <) 

\ °xx J 
1=1 x x 

= it, ( r ) = Z 3 ’ 

i= l \ ‘-’xx J 

the maximum likelihood estimator of (3 is unbiased. 


Next, we show that a is also an unbiased estimator of a. Consider 

E(a) = E (y - ^x) = E (Y) -x E 

\ ^XX J \ ^XX J 

= E (F) - xE (Yj =E(Y)-x (3 

=1 

= — \ Y E (a + (3xl) ] — x (3 

n Kt'i J 

= — [net + B V x,; — x 8 

n v U ') 

= a + (3x — x (3 = a. 
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This proves that a is an unbiased estimator of a and the proof of the theorem 
is now complete. 

Theorem 19.3. In normal regression analysis, the distributions of the esti¬ 
mators (3 and a are given by 

a 2 irV 2 


where 


AM/t — 


and a ~ N [a, -b 


n S T 


Sxx — A ' (Xi x) . 


Proof: Since 


P = 


S* 

S T 


Y 


= a— E - x ) ( Yi - Y ) 

XX 

n 

= E 


i=1 


i=l 

Xi —x 


S 


Yi, 


the (3 is a linear combination of Yi s. As Yi ~ N (a + (3xi, cr 2 ), we see that 
(3 is also a normal random variable. By Theorem 19.2, f3 is an unbiased 
estimator of f3. 

The variance of f3 is given by 


Var 


= E 

i =1 
n 

= E 

Z=1 


— \ 2 
Xi~ X 


Var (Yi/xi) 


_\ 2 

Xi — X\ 2 
<7~ 


= vv E (*< - x ) 


s* 

CT 2 

SL 


2 a 2 


variance . That is /3 ~ IV ^j3. 


Hence (3 is a normal random variable with mean (or expected value) f3 and 

’ S xx )' 

Now determine the distribution of a. Since each Y t ~ N(a + (3xi, cr 2 ), 
the distribution of Y is given by 

F~ N (a + /3x, EV 
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Since 


P~ Nil3, 


the distribution of x (3 is given by 


x (3 ~ N ( x f3, x 


Since a = Y — x f3 and Y and x (3 being two normal random variables, a is 

also a normal random variable with mean equal to a + /3x — /3x = a and 

2 —2 "> 

variance variance equal to ^ 


a ' 


That is 

, a 2 x 2 a 2 

N 1 a ’ ^ + XT 


and the proof of the theorem is now complete. 

It should be noted that in the proof of the last theorem, we have assumed 
the fact that Y and x(3 are statistically independent. 

In the next theorem, we give an unbiased estimator of the variance a 2 . 
For this we need the distribution of the statistic U given by 


U = 


n a 


It can be shown (we will omit the proof, for a proof see Graybill (1961)) that 
the distribution of the statistic 

na 2 

U = ~x 2 (ri- 2). 

(T z 


Theorem 19.4. An unbiased estimator S 2 of a 2 is given by 

2 _ nd 2 
n- 2’ 





where a = < 

/i 



E(S 2 ) = E 


na~ 


n — 2 


n — 2 


n — 2 



E( X 2 (n- 2)) 

(n — 2) = a 2 . 


Proof: Since 
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The proof of the theorem is now complete. 

Note that the estimator S 2 can be written as S 2 = , where 


2 

SSE = S YY = 0S xY = ^ ~2[yi - a- 0x z ] 

i= 1 

the estimator S 2 is unbiased estimator of a 2 . The proof of the theorem is 
now complete. 

In the next theorem we give the distribution of two statistics that can 
be used for testing hypothesis and constructing confidence interval for the 
regression parameters a and 0. 

Theorem 19.5. The statistics 

_ 0-/3 I (n - 2) 

^ a V n 


and 

_ a -a / (n - 2) 

" 3= y n (a;) 2 + S xx 

have both a f-distribution with n — 2 degrees of freedom. 
Proof: From Theorem 19.3, we know that 


P 


N 



Hence by standardizing, we get 


Z = 



N( 0,1). 


Further, we know that the likelihood estimator of cr is 



c S'xY Q 

OYY — -g - o x Y 


and the distribution of the statistic U = is chi-square with n — 2 degrees 
of freedom. 
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Since Z = 



the statistic 



~ N( 0,1) and U = 
t(n — 2). Hence 



% 2 (n — 2), by Theorem 14.6, 



This completes the proof of the theorem. 

In the normal regression model, if p = 0, then E{Y X ) = a. This implies 
that E(Y X ) does not depend on x. Therefore if /3 7 ^ 0, then E(Y X ) is de¬ 
pendent on x. Thus the null hypothesis H 0 : (3 = 0 should be tested against 

H a : (3 7 ^ 0. To devise a test we need the distribution of p. Theorem 19.3 says 
^ t ... 2 
that (3 is normally distributed with mean (3 and variance -p—.. Therefore, we 

have ^ 

Z = ~ N(0,1). 

/ a 
V 

In practice the variance Var(Yi/Xi) which is a 2 is usually unknown. Hence 
the above statistic Z is not very useful. However, using the statistic Qp, 
we can devise a hypothesis test to test the hypothesis H a : (3 = (3 a against 
H a : (3 7 ^ f3 0 at a significance level 7 . For this one has to evaluate the quantity 

n a 2 

(n—2) S x: 



P-P I (n - 2) S'— 
a V n 

and compare it to quantile f 7 / 2 (n — 2). The hypothesis test, at significance 
level 7 , is then “Reject H 0 : f3 = (3 a if \t\ > t 1 i- 1 {n — 2)”. 


The statistic 


Qb 


P~ P / (n - 2) 
a V n 
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is a pivotal quantity for the parameter 0 since the distribution of this quantity 
Qp is a i-distribution with n — 2 degrees of freedom. Thus it can be used for 
the construction of a (1 — 7)100% confidence interval for the parameter 0 as 
follows: 


1-7 

= P (—ti (n — 2) < 


= P \0 -tz(n- 2)1 


0-0 (n-2 )S X 


n 


< tx(n — 2) 


< (3 < (3 + tx(n — 2 )a 


(n - 2 )S X 

Hence, the (1 — 7)% confidence interval for 0 is given by 


(n - 2) S x 


0 — tx(n — 2) a 


n 


(n - 2) S x 


f3 + ti(n—2)a 


n 


(n - 2) S x 


In a similar manner one can devise hypothesis test for a and construct 
confidence interval for a using the statistic Q a . We leave these to the reader. 

Now we give two examples to illustrate how to find the normal regression 
line and related things. 

Example 19.7. Let the following data on the number of hours, x which 
ten persons studied for a French test and their scores, y on the test is shown 
below: 


X 

4 

9 

10 

14 

4 

7 

12 

22 

1 

17 

y 

31 

58 

65 

73 

37 

44 

60 

91 

21 

84 


Find the normal regression line that approximates the regression of test scores 
on the number of hours studied. Further test the hypothesis H 0 : 0 = 3 versus 
H a : 0 ^ 3 at the significance level 0.02. 

Answer: From the above data, we have 


10 

^2,Xi = 100 , 


i= 1 

10 


X] Vi = 564 ’ 

i =1 


10 


X ^ 2 = 1376 

i— 1 
10 

X> 2 = 


10 

Y, XiVi = 6945 
2 = 1 
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S xx = 376, S xy = 1305, S yy = 4752.4. 

Hence 

P = ^v =3A71 and 5 = y _ px = 21.690. 

&XX 

Thus the normal regression line is 

y = 21.690 + 3.471a:. 

This regression line is shown below. 



Regression line y = 21.690 + 3.471 x 


Now we test the hypothesis H a ■. (3 = 3 against H a : (3 ^ 3 at 0.02 level 
of significance. From the data, the maximum likelihood estimate of a is 


c 

c _ ° x v q 
°yy q Dx y 

^XX 





[4752.4 — (3.471) (1305)] 


= 4.720 
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and 


Hence 


3.471 - 3 

1(8) (376) 

4.720 \ 

10 


1.73. 


1.73 = \t\ < t 0 .oi (8) = 2.896. 


Thus we do not reject the null hypothesis that H 0 : /3 = 3 at the significance 
level 0.02. 


This means that we can not conclude that on the average an extra hour 
of study will increase the score by more than 3 points. 

Example 19.8. The frequency of chirping of a cricket is thought to be 
related to temperature. This suggests the possibility that temperature can 
be estimated from the chirp frequency. Let the following data on the number 
chirps per second, x by the striped ground cricket and the temperature, y in 
Fahrenheit is shown below: 


X 

20 

16 

20 

18 

17 

16 

15 

17 

15 

16 

y 

89 

72 

93 

84 

81 

75 

70 

82 

69 

83 


Find the normal regression line that approximates the regression of tempera¬ 
ture on the number chirps per second by the striped ground cricket. Further 
test the hypothesis H 0 : (3 = 4 versus H a : (3 ^ 4 at the significance level 0.1. 

Answer: From the above data, we have 


10 


10 


^Xi = 170, ^2 x i= 2920 

i=l i—1 

10 10 
X> = 789, £>? = 64270 

2=1 

10 

53 X iVi = 13688 


i=1 


i=l 


S xx = 376, 


S xy = 1305, 


Syy = 4752.4. 


Hence 

/3 = —— = 4.067 and a = y — (3 x = 9.761. 

&XX 

Thus the normal regression line is 


y = 9.761 + 4.067cr. 
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This regression line is shown below. 



Regression line y = 9.761 + 4.067x 


Now we test the hypothesis H a : (3 = 4 against H a : /3 ^ 4 at 0.1 level of 
significance. From the data, the maximum likelihood estimate of a is 




/3 S : 


xy 


= \IyO [ 589 -( 4 - 067 )( 122 )] 

= 3.047 


and 


4.067-4 

1(8) (30) 

3.047 \ 

' 10 


0.528. 


Hence 


0.528 = \t\ < t 0 . 05 {8) = 1.860. 
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Thus we do not reject the null hypothesis that H a : (3 = 4 at a significance 
level 0.1. 

Let n x = a + (3 x and write Y x = a + (3 x for an arbitrary but fixed x. 
Then Y x is an estimator of fx x . The following theorem gives various properties 
of this estimator. 

Theorem 19.6. Let x be an arbitrary but fixed real number. Then 

(i) Y x is a linear estimator of Yi,Y 2 , ..., Y n , 

(ii) Y x is an unbiased estimator of fi x , and 

(in) Var (?*) = { £ + } a 2 . 

Proof: First we show that Y x is a linear estimator of Yi,Y 2 , ...,T n . Since 


Y x = a. + f3x 
= Y — (3x + (3 x 
= Y + (3 (x — x) 


= y+jr ( Xfc ~ ( x ~ x ^> Y k 


k=1 


_ Y k ^ ( x k — x) (x — x) 

2-^ n 2s 


S r 


k=l k=l 

1 (Xk —x)(x — x) 

Sxx 


= E 

k=1 


Y, 


Y x is a linear estimator of Yi, Y 2 ,..., Y n . 

Next, we show that Y x is an unbiased estimator of fj, x . Since 

e(Yx>) =E (a + px) 

= E(a) + E(/3x\ 

= a + f3x 
= Mx 


Y x is an unbiased estimator of /i x . 

Finally, we calculate the variance of Y x using Theorem 19.3. The variance 
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of Y x is given by 


Var (V x ) = Var (a + px'j 


= Var(a) + Var(/3x\ + 2 Cov ^2, (3x'j 

( 2 


1 x 2 

0 ‘3 XX 

1 X 2 


x 2 — -h 2 x Cov (2, 


O S X x 

1 (x — x) 2 
- + --— 


— 2x 


O 2 . 


In this computation we have used the fact that 


Cov 


(a, 3 ) 


S. r 


whose proof is left to the reader as an exercise. The proof of the theorem is 
now complete. 


By Theorem 19.3, we see that 

t 2 


a 


P ~ N ( /3, — 

^XX 


^ ^ X2<J 2 

and a ~ N | a, -b — 

^ ‘Jxx 


Since Y x = a + /3 x, the random variable Y x is also a normal random variable 
with mean fi x and variance 


V ar 


(u) = 


1 (x — x) 2 


a 2 . 


Hence standardizing Y x , we have 

Y x fi x 


V ar 


(?*) 


N{ 0,1). 


If a 2 is known, then one can take the statistic Q = 


Y x f-^x 


VarlYx 


as a pivotal 


quantity to construct a confidence interval for fi x . The (1—7)100% confidence 
interval for fi x when a 2 is known is given by 


Y x — zxy Var(Y x ), Y x + z^JVar(Y x ) 
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Example 19.9. Let the following data on the number chirps per second, x 
by the striped ground cricket and the temperature, y in Fahrenheit is shown 
below: 


X 

20 

16 

20 

18 

17 

16 

15 

17 

15 

16 

y 

89 

72 

93 

84 

81 

75 

70 

82 

69 

83 


What is the 95% confidence interval for /?? What is the 95% confidence 
interval for y x when x = 14 and a = 3.047? 

Answer: From Example 19.8, we have 

n = 10, p = 4.067, d = 3.047 and S xx = 376. 



[4.067 - t 0 .02 5 (8) (0.1755), 4.067 + f 0 .02 5 (8) (0.1755)]. 

Since from the f-table, we have to.025(8) = 2.306, the 90% confidence interval 
for P becomes 


[4.067 - (2.306) (0.1755), 4.067 + (2.306) (0.1755)] 
which is [3.6623, 4.4717]. 

If variance a 2 is not known, then we can use the fact that the statistic 
"^2 

U = rj ^ 2 ~ is chi-squares with n — 2 degrees of freedom to obtain a pivotal 
quantity for ji x . This can be done as follows: 



t(n — 2). 
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Using this pivotal quantity one can construct a (1 — 7)100% confidence in¬ 
terval for mean /r as 



2)i 


/ S xx + n{x - x) 2 
{n 2) S xx 


Y x + tx{n 


2)i 


I S xx + n(x — x) 2 
{n 2) S xx 


Next we determine the 90% confidence interval for fi x when x = 14 and 
a = 3.047. The (1 — 7)100% confidence interval for fi x when er 2 is known is 
given by 

Y x — zx\Jvar(Y x ), Y x + zxy/var(Y x ) . 

From the data, we have 

% = a + px = 9.761 + (4.067) (14) = 66.699 


and 


Var (? x ) = (^ + (14 376 17) " ) ^ = (0-124) (3.047) 2 = 1.1512. 

The 90% confidence interval for is given by 

66.699 - 2o.025 Vl-1512, 66.699 + 20.025 Vl.1512 
and since 20.025 = 1-96 (from the normal table), we have 

[66.699 - (1.96) (1.073), 66.699 + (1.96) (1.073)] 
which is [64.596, 68.802], 

We now consider the predictions made by the normal regression equation 
Y x = a + fix. The quantity Y x gives an estimate of = a + fix. Each 
time we compute a regression line from a random sample we are observing 
one possible linear equation in a population consisting all possible linear 
equations. Further, the actual value of Y x that will be observed for given 
value of x is normal with mean a + fix and variance <r 2 . So the actual 
observed value will be different from /i x . Thus, the predicted value for Y x 
will be in error from two different sources, namely (1) a and fi are randomly 
distributed about a and fi, and (2) Y x is randomly distributed about fj, x . 
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Let y x denote the actual value of Y x that will be observed for the value 
x and consider the random variable 


V = Y X — a — /3 x. 

Since T> is a linear combination of normal random variables, T> is also a 
normal random variable. 

The mean of T> is given by 

E(T>) = E(Y X ) — E(S) — x E0) 

= a + /3 x — a — x P 

= 0. 


The variance of T> is given by 

Var(V) = Var(Y x -a-px) 

= Var[Y x ) + Var(a) + x 2 Var(P) + 2 x Cov(a, ft) 


= a 


a 2 x 2 a 2 2 a 2 x 

-+^^ + x 2 — -2x — 

Yj'r.'r. iJi'T' 


= a 


u xx u xx 

a 2 (x — x) 2 a 2 
n 


S x 


(n + 1) S xx + n 2 

Tl S xx 


Therefore 


V~N (0, ( n + VSxx + n ^ 

Tt Snr'nr' 


We standardize V to get 


Z = 


V -0 


1+1) -Sx^+Ti 2 


N( 0,1). 


n S x 


Since in practice the variance of Y x which is er 2 is unknown, we can not use 
Z to construct a confidence interval for a predicted value y x . 

We know that U = ~ X 2 (n — 2). By Theorem 14.6, the statistic 



Probability and Mathematical Statistics 


611 


Z 



t(n — 2). Hence 


Q = 


y x -a-0x / (n — 2) S x 


a 

Vx-a-fi x 


(n + 1) S xx + n 




(n—2) c 
P-0 


yJVa.ri'D) 


(n- 2) 


t{n — 2). 


The statistic Q is a pivotal quantity for the predicted value y x and one can 
use it to construct a (1—7)100% confidence interval for y x . The (1 — 7)100% 
confidence interval, [a, 6], for y x is given by 

1 - 7 = P (-* J (™ - 2) < Q < ti (n - 2)^ 

= -P(a <y x < b), 


where 


and 


a = a + (3x — tz(n — 2)a . 


b=a + (3x + ti{n—2)d 


(n + 1) S xx + n 


(n - 2) 5^ 


(n + 1) S X x H - w 


(■n - 2) S* 


This confidence interval for y x is usually known as the prediction interval for 
predicted value y x based on the given x. The prediction interval represents an 
interval that has a probability equal to 1—7 of containing not a parameter but 
a future value y x of the random variable Y x . In many instances the prediction 
interval is more relevant to a scientist or engineer than the confidence interval 
on the mean p x . 


Example 19.10. Let the following data on the number chirps per second, x 
by the striped ground cricket and the temperature, y in Fahrenheit is shown 
below: 
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X 

20 

16 

20 

18 

17 

16 

15 

17 

15 

16 

y 

89 

72 

93 

84 

81 

75 

70 

82 

69 

83 


What is the 95% prediction interval for y x when x = 14? 
Answer: From Example 19.8, we have 


n= 10, /3 = 4.067, a = 9.761, a = 3.047 and S xx = 376. 

Thus the normal regression line is 

y x = 9.761 + 4.067a:. 

Since x = 14, the corresponding predicted value y x is given by 
y x = 9.761 + (4.067) (14) = 66.699. 

Therefore 

a = a + (3x — tx(n — 2) <r 

= 66.699 — to.025(8) (3.0- 

= 66.699 - (2.306) (3.047) (1.1740) 

= 58.4501. 

Similarly 

b = a + (3x + t^{n — 2) <7 

= 66.699 + to.025(8) (3.0- 

= 66.699 + (2.306) (3.047) (1.1740) 

= 74.9479. 

Hence the 95% prediction interval for y x when x = 14 is [58.4501, 74.9479]. 

19.3. The Correlation Analysis 

In the first two sections of this chapter, we examine the regression prob¬ 
lem and have done an in-depth study of the least squares and the normal 
regression analysis. In the regression analysis, we assumed that the values 
of X are not random variables, but are fixed. However, the values of Y x for 
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a given value of x are randomly distributed about E(Y X ) = p x = a + fix. 
Further, letting £ to be a random variable with E(e) = 0 and Var(e) = a 2 , 
one can model the so called regression problem by 

Y x = a + (3x + £. 

In this section, we examine the correlation problem. Unlike the regres¬ 
sion problem, here both X and Y are random variables and the correlation 
problem can be modeled by 


E(Y) = oc + 0E(X). 


From an experimental point of view this means that we are observing random 
vector ( X , Y) drawn from some bivariate population. 

Recall that if (X, Y) is a bivariate random variable then the correlation 
coefficient p is defined as 

E ((X - p x ) (Y - p Y )) 

P ^E{{x-p x y) E{{Y-p Y Y) 

where p x and py are the mean of the random variables X and Y, respec¬ 
tively. 

Definition 19.1. If (X\, Y\), ( X %, Y 2 ), • ••, (X n , Y n ) is a random sample from 
a bivariate population, then the sample correlation coefficient is defined as 


]T(X, - X) (Yi - Y) 


R = 






The corresponding quantity computed from data (xi,y±), (X2 , 3/2) , •••) (x n , y n ) 
will be denoted by r and it is an estimate of the correlation coefficient p. 

Now we give a geometrical interpretation of the sample correlation coeffi¬ 
cient based on a paired data set {(xi, yi), (X2, 2/2), •••, (x n , y n )}• We can asso¬ 
ciate this data set with two vectors x = (xi, X 2 , x n ) and y = (j/i, y 2 , ..., y n ) 
in 1R". Let C be the subset {Ae | A € R} of R n , where e = (1,1,..., 1) £ 1R". 
Consider the linear space V given by R” modulo £, that is V =R™/£. The 
linear space V is illustrated in a figure on next page when n = 2. 
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We denote the equivalence class associated with the vector x by [x\. In 
the linear space V it can be shown that the points (xi,yi), ( X 2 , 2 / 2 ), • ••, ( x n , y n ) 
are collinear if and only if the the vectors [ x\ and [y\ in V are proportional. 
We define an inner product on this linear space V by 

n 

([®])[y]) = ( Vi~y )• 

i=l 


Then the angle 9 between the vectors [x\ and \y\ is given by 


cos(0) = 


(mm 

VWWDVWW) 


which is 


_ x ) (yi - y) 


cos (9) = 


-u x ) 


= r. 


^ ^(vi - y? 


Thus the sample correlation coefficient r can be interpreted geometrically as 
the cosine of the angle between the vectors [x\ and [y\. From this view point 
the following theorem is obvious. 
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Theorem 19.7. The sample correlation coefficient r satisfies the inequality 

-1 < r < 1. 

The sample correlation coefficient r = ±1 if and only if the set of points 
{(2h,3/i), 0^2, 2 / 2 ), (x n ,y n )} for n > 3 are collinear. 

To do some statistical analysis, we assume that the paired data is a 
random sample of size n from a bivariate normal population ( X , Y) ~ 
BVN(fii, H 2 , af, o’!, p). Then the conditional distribution of the random 
variable Y given X = x is normal, that is 

Y\ x ~ N (^i 2 + P^r( x ~ Mi), o’!6 - P)^) • 

This can be viewed as a normal regression model E[Y \ X ) = a + (3x where 

a = n~ P = Pff’ and Var ( Y \x) = 0-1(1 - p 2 )- 

Since f3 = p^,ii p = 0, then /3 = 0. Hence the null hypothesis H a : p = 0 
is equivalent to H a : (3 = 0. In the previous section, we devised a hypothesis 
test for testing H 0 : (3 = j3 a against H a : (3 ^ (3 a . This hypothesis test, at 
significance level 7 , is “Reject H 0 : (3 = (3 0 if |f| >ti(n — 2)”, where 

(3-(3 I (n - 2) S xx 
n • 

If f3 = 0, then we have 

t=l 

a 

Now we express t in term of the sample correlation coefficient r. Recall that 

d=^, (ii) 

^xx 



and 



( 12 ) 

( 13 ) 
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Now using (11), (12), and (13), we compute 


t = 


P (n-2)S x 


n 


Jxy 


s x 


Q _ dxy q 

yy s xx x y 


->xy 


\J$xx $yy 
= \/n — 2 


2 _ Sxy S xy 

S x x S'y y 


(ti 2 ) S xx 
n 

- Vn — 2 


\J\ - r 2 


Hence to test the null hypothesis H a : p = 0 against H a : p ^ 0, at 
significance level 7 , is “Reject H a : p = 0 if |t| > ti{n — 2)”, where t = 
v" ^ 

This above test does not extend to test other values of p except p = 0. 
However, tests for the nonzero values of p can be achieved by the following 
result. 

Theorem 19.8. Let {X\, Yi), (X 2 , T 2 ), •••, (X n , Y n ) be a random sample from 
a bivariate normal population (X,Y) ~ BVN(pi 1 p 2 , erf, a%, p). If 


,, 1 , (i + R 

v =2 1,1 nrs 


“ d 


then 


Z = \/n - 3 (V — m ) —> N( 0,1) asn^oo. 


This theorem says that the statistic V is approximately normal with 
mean m and variance when n is large. This statistic can be used to 
devise a hypothesis test for the nonzero values of p. Hence to test the null 
hypothesis H a : p = p a against H a : p ^ at significance level 7, is “Reject 
H a : p = p 0 if \z\ > Zz”, where z = \Jn — 3 ( V — m 0 ) and m 0 = \ In 

Example 19.11. The following data were obtained in a study of the rela¬ 
tionship between the weight and chest size of infants at birth: 


x, weight in kg 

2.76 

2.17 

5.53 

4.31 

2.30 

3.70 

y, chest size in cm 

29.5 

26.3 

36.6 

27.8 

28.3 

28.6 
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Determine the sample correlation coefficient r and then test the null hypoth¬ 
esis H 0 : p = 0 against the alternative hypothesis H a : p ^ 0 at a significance 
level 0.01. 

Answer: From the above data we find that 


x = 3.46 and y = 29.51. 

Next, we compute S xx , S yy and S xy using a tabular representation. 


x — X 

y-y 

(x-x)(y-y) 

(x — x) 2 

(y-y) 2 

-0.70 

-0.01 

0.007 

0.490 

0.000 

-1.29 

-3.21 

4.141 

1.664 

10.304 

2.07 

7.09 

14.676 

4.285 

50.268 

0.85 

-1.71 

-1.453 

0.722 

2.924 

-1.16 

-1.21 

1.404 

1.346 

1.464 

0.24 

-0.91 

-0.218 

0.058 

0.828 



S xy = 18.557 

S xx = 8.565 

S vy = 65.788 


Hence, the correlation coefficient r is given by 


r = 


Jxy 


18.557 


jS xx S yy y/ (8.565) (65.788) 

The computed t value is give by 

0.782 


= 0.782. 


t = y/n — 2 




= V(6 - 2) 


sj 1 - (0.782) 2 


= 2.509. 


From the f-table we have to.oos(4) = 4.604. Since 


2.509 = \t\ t ffi.oo 5 (4) = 4.604 


we do not reject the null hypothesis H a : p = 0. 

19.4. Review Exercises 

1. Let Yi,Y 2, ...,Y n be n independent random variables such that each 
Y,j ~ N((3xi,cr 2 ), where both 0 and a 2 are unknown parameters. If 
{(xi,yi),( x 2 ,y 2 ), —,(x n ,yn)} is a data set where yi,y 2 ,...,y n are the ob¬ 
served values based on X\, X2, ..., x n , then find the maximum likelihood esti¬ 
mators of /? and <t 2 of (3 and a 2 . 
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2 . Let Yl,Y 2 , ...,Y n be n independent random variables such that each 
Y,j ~ N((3xi,cr 2 ), where both 0 and a 2 are unknown parameters. If 
{(xi,yi),{x 2 ,y2),-,{Xn,y n )} is a data set where yi,y2,-,Un are the ob¬ 
served values based on xi, X2, ..., x n , then show that the maximum likelihood 
estimator of 0 is normally distributed. What are the mean and variance of 

P 

3. Let Yi,Y 2 ,...,Y n be n independent random variables such that each 

Yi ~ N((3xi,cr 2 ), where both 0 and cr 2 are unknown parameters. If 

{{xi,yi),(x 2 ,y 2 ),—,(x n ,y n )} is a data set where yi,y 2 ,...,y n are the ob¬ 

served values based on X\,X 2 then find an unbiased estimator a 2 of 
ct 2 and then find a constant k such that ka 2 ~ % 2 (2?r). 

4. Let Yi,Yz, ...,Y n be n independent random variables such that each 

Y t ~ N((3xi,a 2 ), where both 0 and a 2 are unknown parameters. If 

{(xi,yi),(x 2 ,y2), —,(x n ,y n )} is a data set where yi,y2,—,y n are the ob¬ 

served values based on X\,X2,.--, x n , then find a pivotal quantity for 0 and 
using this pivotal quantity construct a (1 — 7 ) 100 % confidence interval for 0. 

5. Let Yi 1 Y 2 ,...,Y n be n independent random variables such that each 

Y,j ~ N(0Xi,a 2 ), where both 0 and cr 2 are unknown parameters. If 

{(x 1 ,yi),(x2,y2),—,(x n ,y n )} is a data set where yi,y2,...,y n are the ob¬ 

served values based on xi,X2,--,x n , then find a pivotal quantity for a 2 and 
using this pivotal quantity construct a (1 — 7 ) 100 % confidence interval for 


6. Let Yi,y 2 ,...,y n be n independent random variables such that 

each Yi ~ EXP(0Xi), where 0 is an unknown parameter. If 

{(xi,yi),(x2,y2),—,{x n ,yn)} is a data set where yi,y2,—,y n are the ob¬ 
served values based on aq, X2, •••, x n , then find the maximum likelihood esti¬ 
mator of 0 of 0. 

7. Let Y \, Y 2 , • • •, Y n be n independent random variables such that 

each Y. t ~ EXP(0x.i), where 0 is an unknown parameter. If 

{(*i,yi), (x 2 ,y 2 ), (x n ,y n )} is a data set where yi,y 2 ,—,y n are the ob¬ 
served values based on X\,X 2 , then find the least squares estimator of 

0 of 0. 

8. Let Y \, Y 2 ,..., Y n be n independent random variables such that 

each Yi ~ POI(0Xi), where 0 is an unknown parameter. If 

{(x 1 ,yi),(x 2 ,y 2 ),—,(x n ,y n )} is a data set where yi,y 2 ,...,y n are the ob- 
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served values based on X\,X 2 ,...,x n , then find the maximum likelihood esti¬ 
mator of fit of (3. 

9. Let Yi,Y 2 , ...,Y. n be n independent random variables such that 

each Y t ~ POI(f3xi), where f3 is an unknown parameter. If 

{(x 1 ,yi),(x 2 ,y 2 ),—,{x n ,y n )} is a data set where yi,y 2 ,...,y n are the ob¬ 
served values based on xi,x 2 , then find the least squares estimator of 

f3 of f3. 

10. Let Y 1 .Y 2 , .... Y n be n independent random variables such that 

each Y t ~ POI(f3xi), where f3 is an unknown parameter. If 

{(*i,2/i), (* 2 , 2 / 2 ),—,(*n,2/n)} is a data set where yi,y 2 ,...,y n are the ob¬ 
served values based on X\, x 2 ,..., x n , show that the least squares estimator 
and the maximum likelihood estimator of (3 are both unbiased estimator of 

fi¬ 

ll. Let Yi,Y 2 ,...,Y n be n independent random variables such that 

each Yj ~ POI{f3xi), where f3 is an unknown parameter. If 

{(* 1 , 2 / 1 ), (* 2 , 2 / 2 ),..., (x n ,y n )} is a data set where yi,y 2 ,...,y n are the ob¬ 
served values based on xi,x 2 , ...,x n , the find the variances of both the least 
squares estimator and the maximum likelihood estimator of f3. 

12. Given the five pairs of points (x,y) shown below: 


X 

10 

20 

30 

40 

50 

y 

50.071 

0.078 

0.112 

0.120 

0.131 


What is the curve of the form y = a + bx + cx 2 best fits the data by method 
of least squares? 

13. Given the five pairs of points (x,y) shown below: 


X 

4 

7 

9 

10 

11 

y 

10 

16 

22 

20 

25 


What is the curve of the form y = a + bx best fits the data by method of 
least squares? 

14. The following data were obtained from the grades of six students selected 
at random: 


Mathematics Grade, x 

72 

94 

82 

74 

65 

85 

English Grade, y 

76 

86 

65 

89 

80 

92 
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Find the sample correlation coefficient r and then test the null hypothesis 
H a : p = 0 against the alternative hypothesis H a : p ^ 0 at a significance 
level 0.01. 

15. Given a set of data {(xi, 3/2)5 (x2> 2/2)5 (x n , y n )} what is the least square 
estimate of a if y = a is fitted to this data set. 

16. Given a set of data points {(2,3), (4,6), (5,7)} what is the curve of the 
form y = a + (3 x 2 best fits the data by method of least squares? 

17. Given a data set {(1,1), (2,1), (2, 3), (3,2), (4, 3)} and Y x ~ N(a + 
f3x, a 2 ), find the point estimate of a 2 and then construct a 90% confidence 
interval for a. 

18. For the data set {(1,1), (2,1), (2,3), (3, 2), (4,3)} determine the correla¬ 
tion coefficient r. Test the null hypothesis H 0 : p = 0 versus H a : p ^ 0 at a 
significance level 0.01. 
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Chapter 20 

ANALYSIS OF VARIANCE 


In Chapter 19, we examine how a quantitative independent variable x 
can be used for predicting the value of a quantitative dependent variable y. In 
this chapter we would like to examine whether one or more independent (or 
predictor) variable affects a dependent (or response) variable y. This chap¬ 
ter differs from the last chapter because the independent variable may now 
be either quantitative or qualitative. It also differs from the last chapter in 
assuming that the response measurements were obtained for specific settings 
of the independent variables. Selecting the settings of the independent vari¬ 
ables is another aspect of experimental design. It enables us to tell whether 
changes in the independent variables cause changes in the mean response 
and it permits us to analyze the data using a method known as analysis of 
variance (or ANOVA). Sir Ronald Aylmer Fisher (1890-1962) developed the 
analysis of variance in 1920’s and used it to analyze data from agricultural 
experiments. 

The ANOVA investigates independent measurements from several treat¬ 
ments or levels of one or more than one factors (that is, the predictor vari¬ 
ables). The technique of ANOVA consists of partitioning the total sum of 
squares into component sum of squares due to different factors and the error. 
For instance, suppose there are Q factors. Then the total sum of squares 
(SSt) is partitioned as 


SSt = SSa + SSb + • • • + SSq + SSError, 

where SSa, SSb, and SSq represent the sum of squares associated with 
the factors A, B, ..., and Q, respectively. If the ANOVA involves only one 
factor, then it is called one-way analysis of variance. Similarly if it involves 
two factors, then it is called the two-way analysis of variance. If it involves 
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more then two factors, then the corresponding ANOVA is called the higher 
order analysis of variance. In this chapter we only treat the one-way analysis 
of variance. 

The analysis of variance is a special case of the linear models that rep¬ 
resent the relationship between a continuous response variable y and one or 
more predictor variables (either continuous or categorical) in the form 

V = X/3 + e (1) 

where y is an raxl vector of observations of response variable, X is the 
m x n design matrix determined by the predictor variables, /3 is n x 1 vector 
of parameters, and e is an m x 1 vector of random error (or disturbances) 
independent of each other and having distribution. 

20.1. One-Way Analysis of Variance with Equal Sample Sizes 

The standard model of one-way ANOVA is given by 

Y. lJ =iM + e i j for i = l,2,..., m, j = l,2, ...,n, (2) 

where m > 2 and n > 2. In this model, we assume that each random variable 

Y i:i ~ N{m,<j 2 ) for i = 1,2,...,m, j = 1,2, ...,n. (3) 

Note that because of (3), each in model (2) is normally distributed with 
mean zero and variance <r 2 . 

Given m independent samples, each of size n, where the members of the 
i th sample, Yu, ..., Yi„, are normal random variables with mean /j, and 
unknown variance <r 2 . That is, 

Yij ~ N , i = 1,2, ;.., m, j = 1,2,..., n. 

We will be interested in testing the null hypothesis 

H 0 : Hi = 1*2 = ■ ■ ■ = Mm = y 

against the alternative hypothesis 


H a : not all the means are equal. 
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In the following theorem we present the maximum likelihood estimators 
of the parameters ...,/U TO and cr 2 . 

Theorem 20.1. Suppose the one-way ANOVA model is given by the equa¬ 
tion (2) where the e.y’s are independent and normally distributed random 
variables with mean zero and variance a 2 for i = 1,2,..., m and j = 1, 2,..., n. 
Then the MLE’s of the parameters /q (* = 1,2, ...,m) and cr 2 of the model 
are given by 

/r* — Ia* i — 1 , 2 ,..., 777., 

cr 2 = -SSw> 

nm 

n m n 

where Y lt = and SS W = ( F b - ^i.) is the within samples 

3 =1 i=lj=l 

sum of squares. 
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where 


— 1 

Yi. = - >>„. 

n ^ 

3 =1 


It can be checked that these solutions yield the maximum of the likelihood 
function and we leave this verification to the reader. Thus the maximum 
likelihood estimators of the model parameters are given by 


fjji — F j# i 1,2,..., 777 ., 

er 2 = -SSw> 

nm 


n n 

where SSw = (Yy — Y».) . The proof of the theorem is now complete. 

*=ij=i 


Define 


Further, define 


Y . 


1 

nm 


m n 




ss T = EE ( y b -^..) 2 

i=ij=i 


(7) 

( 8 ) 


SS w = EE i Y ij- Y i .) 2 (9) 

*=ij=i 

and 

m n 

ss b = EE ( f «^ f ») 2 ( 10 ) 

i=lj=l 

Here SSt is the total sum of square, SSw is the within sum of square, and 
SSb is the between sum of square. 


Next we consider the partitioning of the total sum of squares. The fol¬ 
lowing lemma gives us such a partition. 

Lemma 20.1. The total sum of squares is equal to the sum of within and 
between sum of squares, that is 


SS T = SS W + SS B . 


(11) 
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Proof: Rewriting (8) we have 

m n 

ss t = EE (^~ F ») 2 

*=ij=i 

m n 

= EE [(Yij-Yi.)+ (¥>.-Y..)] 2 

i=lj=l 

m n m n 

= EE(P> - F -) 2 +EE 0 '- - F ») 2 

i=lj—l 

m n 

+ 2 EE^-^.)^.-?..) 

*= 1^=1 

m n 

= SS W + SS B + 2EE(^- “ F ") ( F i* - F ««)' 

i=ii=i 

The cross-product term vanishes, that is 

m n m n 

EE( y ^ - F «) ( F « - F ~) = E( y « - y »)E( y d - y *) = o- 

i=lj=l i—1 j =1 

Hence we obtain the asserted result SSt = SSw + SSb and the proof of the 
lemma is complete. 

The following theorem is a technical result and is needed for testing the 
null hypothesis against the alternative hypothesis. 

Theorem 20.2. Consider the ANOVA model 


Yij /y T Cjj 

*=1,2, 

j = 1,2, ...,n, 

where Yy ~ N Then 



(a) the random variable <■ 

~ X 2 (m(n 

— 1)), and 

(b) the statistics SSw and SSb are independent. 

Further, if the null hypothesis 

M 

0 

II 

H '2 = ■ ■ ■ = fj, m = /i is true, then 

(c) the random variable ~ 

J x 2 (m - 

1), 

(d) the statistics ~ 

F(m — 1. 

, m(n — 1)), and 

(e) the random variable ~ 

x 2 ( nm - 

-!)• 
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Proof: In Chapter 13, we have seen in Theorem 13.7 that if Xi, X 2 , ■■■, X n 
are independent random variables each one having the distribution N(fi, a 2 ), 

n 

then their mean X and — X) 2 have the following properties: 

i— 1 

n 

(i) X and - X) 2 are independent, and 

i=l 

n 

(h) ^(X-X) 2 ~xV- !)• 

i=l 

Now using (i) and (ii), we establish this theorem. 

(a) Using (ii), we see that 


1 ” 


i =1 


for each i = 1,2,..., m. Since 


^ \ 2 


Y (Yij - Y t .y and Y (Y^ - Y Vm 
3 =1 3 =1 


are independent for i' ^ i, we obtain 


m ^ n 


Y X 2 Y ( K b - Y i») ~X 2 (m(n- 1)). 


a 

i=i j =1 


Hence 


qq rn n 




<=1 i=i 


m ^ n 


= 5 Z 7T Yl ( y b - ~ X 2 M« - !))• 


<7 

i=1 3=1 


(b) Since for each i = l,2,...,?n, the random variables Ui, U 2 , •••, Un are 
independent and 

Uii,Fj2, — ,*i n ~ IV (/ii,<7 2 ) 

we conclude by (i) that 

n 

Y (Yij-Yi.) 2 and Y,. 

3=1 
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are independent. Further 

n 

and Y Vm 

3=1 

are independent for i' ^ i. Therefore, each of the statistics 

n 

^(Fj-F,.) 2 i = 1,2 ,..., m 
3 = 1 

is independent of the statistics Fi.,y 2 ., Y rnm , and the statistics 

n 

E (Y tJ - Y t .f i = 1,2,..., m 

3 = 1 

are independent. Thus it follows that the sets 

n 

Y. (Yjj -Yj.) i = 1,2, ..., m and Y*. i=l,2,...,m 

3=1 

are independent. Thus 

ran m n 

EE^'-F,.) 2 and EE ( F -- F ») 2 

i— 1 j —1 i— 1 j —1 

are independent. Hence by definition, the statistics SSw and SSb are 
independent. 

Suppose the null hypothesis H 0 : Hi = p, 2 = • • • = n m = ji is true. 

(c) Under H 0 , the random variables Fi # , Y 2# , ..., Y m . are independent and 
identically distributed with N ^/i, Therefore by (ii) 

m 

^2 E ( F « - F **) 2 ~ X 2 {m - 1 ). 

i=1 


QQ -| ''*■ 


G G 


i=l j = 1 


= ^E( y «- 1 ») 2 -P( m - 1 )- 


Hence 
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(d) Since 

and 

therefore 


That is 


SSw 


SS B 


SS b 

(m— 1) < j 2 

SSw 

(n(m—1) a 2 


X 2 (m(n- 1)) 

- x 2 (™-1) 

F(m — 1, m(n 


!))■ 


SS B 

( sSw } ~ F ( TO ~ l,m(n- 1)). 

(n(m—1) 


(e) Under H c , the random variables Yy, i = 1,2, j = 1,2are 

independent and each has the distribution By (ii) we see that 

1 m n 

^ F ..) 2 ~ xV ™ - !)• 

*=ij=i 

Hence we have 

SSt 2 / 

—„ ~ x - 1 ) 

a z 

and the proof of the theorem is now complete. 

From Theorem 20.1, we see that the maximum likelihood estimator of each 
Hi (i = 1,2,..., to) is given by 


and since Y 



Hi 1 i«. 


E{h%) = E iXi») = Hi- 


Thus the maximum likelihood estimators are unbiased estimator of Hi for 

* = 1,2,..., m. 


Since 


SSw 


and by Theorem 20.2, yjSSw ~ x 2 ( m ( n ~ 1)), we have 


E 


(cr 2 ) 


= E 


SSw 


mn 


mil 


= - a 2 E ( —SSw ) = -o’ 2 to(ti — 1 ) ^ a 2 . 
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Thus the maximum likelihood estimator a 2 of a 2 is biased. However, the 
estimator ; s an unbiased estimator. Similarly, the estimator is 

an unbiased estimator where as is a biased estimator of a 2 . 


Theorem 20.3. Suppose the one-way ANOVA model is given by the equa¬ 
tion (2) where the e^’s are independent and normally distributed random 
variables with mean zero and variance <r 2 for i = 1, 2 , ..., m and j = 1 , 2 ,..., n. 
The null hypothesis H 0 : fii = fi 2 = ■ ■ ■ = ii rn = M is rejected whenever the 
test statistics T satisfies 

SS b /()b - 1) 


T = 


SSw/(m(n — 1)) 


> F a (m — 1, m(n — 1)), 


( 12 ) 


where a is the significance level of the hypothesis test and F a (rn—1. m(n—1)) 
denotes the 100(1 — a) percentile of the F-distribution with m — 1 numerator 
and nm — m denominator degrees of freedom. 


Proof: Under the null hypothesis H 0 : /n = /i 2 = • • • = p TO = /i, the 
likelihood function takes the form 


^ 2 )=nn 

i=ij=i 




V2t,to 2 


e 


V2t, 




i=ij=i 



Taking the natural logarithm of the likelihood function and then maximizing 
it, we obtain 

/5 = F„ and 5 h~ = -SSt 

mn 

as the maximum likelihood estimators of /i and cr 2 , respectively. Inserting 
these estimators into the likelihood function, we have the maximum of the 
likelihood function, that is 


maxi(|U, a 2 ) = 


Ho i=lj=l 


27TCTy 


Simplifying the above expression, we see that 
max L(fi, a 2 ) = 


■ SSi 


27TCr^ 
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which is 


maxL(/i, a 2 ) 



(13) 


When no restrictions imposed, we get the maximum of the likelihood function 
from Theorem 20.1 as 


maxi(/Ji,/i 2l a 2 ) = 


1 


. V2na 2 , 


O *=1.7=1 


Simplifying the above expression, we see that 
maxl(^i,/t 2 ,...,/i m ,a 2 ) = ^ 


2SSw 




g * =>£3^ 


which is 


maxL(/ii,/i 2 , ...,£t m ,cr 2 ) 



(14) 


Next we find the likelihood ratio statistic W for testing the null hypoth¬ 
esis H 0 : Hi = /Z 2 = • • • = Ihri = !->■■ Recall that the likelihood ratio statistic 
W can be found by evaluating 

maxl(/j,(j 2 ) 
maxT(//,!, /x 2 , Umi <j2 ) 

Using (13) and (14), we see that 


W = 



(15) 


Hence the likelihood ratio test to reject the null hypothesis H 0 is given by 
the inequality 

W < k 0 

where ko is a constant. Using (15) and simplifying, we get 


: i 
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where k\ = ■ Hence 


SSt /mn 
SSw /mn ^j 2 


> ki. 


Using Lemma 20.1 we have 


SS W + SS B 

SSw 


> k\. 


Therefore 


SS B 

SSw 


(16) 


where k = k\ — 1. In order to find the cutoff point k in (16), we use Theorem 
20.2 (d). Therefore 


_ SS B /(m - 1) m(n - 1) 
SSw/(m(n — 1)) m — 1 

Since T has F distribution, we obtain 


min — 1) 

-1 ^ 

m — 1 


F a (m 


1, m(n — 1)). 


Thus, at a significance level a, reject the null hypothesis H 0 if 
SS B /(m - 1) 


T = 


SSw /(m(n — 1)) 


> F a (m — 1, m(n — 1)) 


and the proof of the theorem is complete. 


The various quantities used in carrying out the test described in Theorem 
20.3 are presented in a tabular form known as the ANOVA table. 


Source of 

Sums of 

Degree of 

Mean 

F-statistics 

variation 

squares 

freedom 

squares 

T 

Between 

SS B 

m — 1 

MSb=!^ 

n 7 MSb 

^ ~ MS W 

Within 

SSw 

m(n — 1) 

MS w= 


Total 

SS T 

mn — 1 




Table 20.1. One-Way ANOVA Table 
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At a significance level a, the likelihood ratio test is: “Reject the null 
hypothesis H 0 : /.q = = • • • = p m = p if T > F a (m — 1, m(n — 1))-” One 

can also use the notion of p —value to perform this hypothesis test. If the 
value of the test statistics is T = 7 , then the p-value is defined as 

p — value = P(F(m — 1 ,m(n — 1)) > 7 ). 

Alternatively, at a significance level a, the likelihood ratio test is: “Reject 
the null hypothesis H 0 : /q = /q = • • • = M if P — value < a.” 

The following figure illustrates the notions of between sample variation 
and within sample variation. 



The ANOVA model described in (2), that is 

y,., /'. • <7, for i = 1,2 ,..., m, j = 1, 2,..., n, 

can be rewritten as 

Yij = H + oii + e»j for i = 1,2,..., m, j = l,2,...,n, 

m 

where p is the mean of the ?7i values of pi, and = 0. The quantity on is 

*=1 

called the effect of the f th treatment. Thus any observed value is the sum of 
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an overall mean p, a treatment or class deviation ctj, and a random element 
from a normally distributed random variable with mean zero and variance 
a 2 . This model is called model I, the fixed effects model. The effects of the 
treatments or classes, measured by the parameters a*, are regarded as fixed 
but unknown quantities to be estimated. In this fixed effect model the null 
hypothesis Hq is now 


H 0 : «i = 012 = • • • = 0L m = 0 
and the alternative hypothesis is 

H a : not all the a\ are zero. 

The random effects model, also known as model II, is given by 

Yij = M + A i + t-ij for i = 1, 2 , ..., m, j = 1, 2 ,..., n, 

where /i is the overall mean and 

Ai~N(0,a\) and ~ N(Q, a 2 ). 

In this model, the variances a\ and cr 2 are unknown quantities to be esti¬ 
mated. The null hypothesis of the random effect model is H 0 : = 0 and 

the alternative hypothesis is H a : a\ > 0. In this chapter we do not consider 
the random effect model. 

Before we present some examples, we point out the assumptions on which 
the ANOVA is based on. The ANOVA is based on the following three as¬ 
sumptions: 

(1) Independent Samples: The samples taken from the population under 
consideration should be independent of one another. 

(2) Normal Population: For each population, the variable under considera¬ 
tion should be normally distributed. 

(3) Equal Variance: The variances of the variables under consideration 
should be the same for all the populations. 

Example 20.1. The data in the following table gives the number of hours of 
relief provided by 5 different brands of headache tablets administered to 25 
subjects experiencing fevers of 38°C or more. Perform the analysis of variance 
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and test the hypothesis at the 0.05 level of significance that the mean number 
of hours of relief provided by the tablets is same for all 5 brands. 


Tablets 

A 

B 

C 

D 

F 

5 

9 

3 

2 

7 

4 

7 

5 

3 

6 

8 

8 

2 

4 

9 

6 

6 

3 

1 

4 

3 

9 

7 

4 

7 


Answer: Using the formulas ( 8 ), (9) and (10), we compute the sum of 
squares SSw, SSb and SSt as 


SS W = 57.60, SS B = 79.94, and SS T = 137.04. 
The ANOVA table for this problem is shown below. 


Source of 
variation 

Sums of 
squares 

Degree of 
freedom 

Mean 

squares 

F-statistics 

T 

Between 

79.94 

4 

19.86 

6.90 

Within 

57.60 

20 

2.88 


Total 

137.04 

24 




At the significance level a = 0.05, we find the F-table that Fq . 05(4 , 20) = 
2.8661. Since 

6.90 = T > To. 05 (4, 20) = 2.8661 

we reject the null hypothesis that the mean number of hours of relief provided 
by the tablets is same for all 5 brands. 

Note that using a statistical package like MINITAB, SAS or SPSS we 
can compute the p-value to be 

p - value = P{F{ 4,20) > 6.90) = 0.001. 


Hence again we reach the same conclusion since p -value is less then the given 
a for this problem. 
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Example 20.2. Perform the analysis of variance and test the null hypothesis 
at the 0.05 level of significance for the following two data sets. 


Data Set 1 

Data Set 2 

Sample 

Sample 

A 

B 

C 

A 

B 

C 

8.1 

8.0 

14.8 

9.2 

9.5 

9.4 

4.2 

15.1 

5.3 

9.1 

9.5 

9.3 

14.7 

4.7 

11.1 

9.2 

9.5 

9.3 

9.9 

10.4 

7.9 

9.2 

9.6 

9.3 

12.1 

9.0 

9.3 

9.3 

9.5 

9.2 

6.2 

9.8 

7.4 

9.2 

9.4 

9.3 


Answer: Computing the sum of squares SSw, SSb and SSt, we have the 
following two AN OVA tables: 


Source of 
variation 

Sums of 
squares 

Degree of 
freedom 

Mean 

squares 

F-statistics 

T 

Between 

0.3 

2 

0.1 

0.01 

Within 

187.2 

15 

12.5 


Total 

187.5 

17 




and 


Source of 
variation 

Sums of 
squares 

Degree of 
freedom 

Mean 

squares 

F-statistics 

3= 

Between 

0.280 

2 

0.140 

35.0 

Within 

0.600 

15 

0.004 


Total 

0.340 

17 
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At the significance level a = 0.05, we find from the F-table that Fq . 05 ( 2 , 15) = 
3.68. For the first data set, since 


0.01 = T < F 0 .o 5 (2, 15) = 3.68 

we do not reject the null hypothesis whereas for the second data set, 
35.0 = T > F 0 . 05 (2, 15) = 3.68 


we reject the null hypothesis. 

Remark 20.1. Note that the sample means are same in both the data 
sets. However, there is a less variation among the sample points in samples 
of the second data set. The ANOVA finds a more significant differences 
among the means in the second data set. This example suggests that the 
larger the variation among sample means compared with the variation of 
the measurements within samples, the greater is the evidence to indicate a 
difference among population means. 

20.2. One-Way Analysis of Variance with Unequal Sample Sizes 

In the previous section, we examined the theory of ANOVA when sam¬ 
ples are same sizes. When the samples are same sizes we say that the ANOVA 
is in the balanced case. In this section we examine the theory of ANOVA 
for unbalanced case, that is when the samples are of different sizes. In ex¬ 
perimental work, one often encounters unbalance case due to the death of 
experimental animals in a study or drop out of the human subjects from 
a study or due to damage of experimental materials used in a study. Our 
analysis of the last section for the equal sample size will be valid but have to 
be modified to accommodate the different sample size. 

Consider m independent samples of respective sizes n\, n 2 ,..., n m , where 
the members of the ?’ th sample, Yu, Fj 2 , *«m Y in ., are normal random variables 
with mean /i, ; and unknown variance a 2 . That is, 

Yij ~ N (/q, cr 2 ) , i = 1,2,..., to, j = 1,2, ..., m. 

Let us denote N = m + n 2 -t-1- n m . Again, we will be interested in testing 

the null hypothesis 

H 0 : nx = H 2 = ■ ■ ■ = Hm = H 
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against the alternative hypothesis 

H a : not all the means are equal. 


Now we defining 


1 n 

* *• = — XXb’ 

•H ■ i 

3=1 

(17) 

1 m n-i 

Y •• = N 

i=lj=l 

(18) 

m rii 

ss t=EE {Yh-Y..) 2 , 

i=ij=i 

(19) 

m rii 

SSw = EE {Yij-Yi.) 2 , 

i=ii=i 

(20) 

m rii 

SSb = EE ( F -- F ») 2 

(21) 


i=ij=i 


we have the following results analogous to the results in the previous section. 

Theorem 20.4. Suppose the one-way ANOVA model is given by the equa¬ 
tion (2) where the e^’s are independent and normally distributed random 
variables with mean zero and variance a 2 for i = 1,2,..., m and j = 1,2,..., rij. 
Then the MLE’s of the parameters Hi [i = 1,2, and er 2 of the model 

are given by 

Hi — Y it Z — 1, 2,..., 777 ., 

0-2 = ^ ssw ’ 
rii m rti 

where Y lt = j Y. Y 'i ancl SSw = XX ( Y u - is the within samples 

j= 1 i=lj=l 

sum of squares. 

Lemma 20.2. The total sum of squares is equal to the sum of within and 
between sum of squares, that is SSt = SSw + SSb- 

Theorem 20.5. Consider the ANOVA model 

Yj/j — Hi “t“ €ij i 1,2,..., 777, j — 1, 2, ..., Tlij 
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where Yy ~ TV Then 

(a) the random variable ~ % 2 (TV — m), and 

(b) the statistics SSw and SSb are independent. 

Further, if the null hypothesis H 0 : n\ = /Z 2 = • • • = /z m = n is true, then 

(c) the random variable SS J?' ~ X 2 (m — 1), 

(cl) the statistics ~ -FXm — 1, TV — to), and 

(e) the random variable ~ X 2 (TV — 1). 

Theorem 20.6. Suppose the one-way ANOVA model is given by the equa¬ 
tion (2) where the ejj’s are independent and normally distributed random 
variables with mean zero and variance ct 2 for i = 1,2,..., m and j = 1,2,..., n^. 
The null hypothesis H 0 : Hi = H 2 = • • • = Hm. = /x is rejected whenever the 
test statistics T satisfies 


SS B /(m - 1) 

SS w /(A r - m) 


> F a (m 


1, JV-m), 


where a is the significance level of the hypothesis test and F a (m— 1, N — m) 
denotes the 100(1 — a) percentile of the F-distribution with m — 1 numerator 
and TV — m denominator degrees of freedom. 

The corresponding ANOVA table for this case is 


Source of 
variation 

Sums of 
squares 

Degree of 
freedom 

Mean 

squares 

F-statistics 

T 

Between 

SS B 

?n — l 

MS b = 

•77 MSb 

^ ~ MS W 

Within 

SS W 

TV — m 

MS W = £ Sw 

vv N—m 


Total 

SS T 

N- 1 




Table 20.2. One-Way ANOVA Table with unequal sample size 

Example 20.3. Three sections of elementary statistics were taught by dif¬ 
ferent instructors. A common final examination was given. The test scores 
are given in the table below. Perform the analysis of variance and test the 
hypothesis at the 0.05 level of significance that there is a difference in the 
average grades given by the three instructors. 
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Elementary Statistics 

Instructor A 

Instructor B 

Instructor C 

75 

90 

17 

91 

80 

81 

83 

50 

55 

45 

93 

70 

82 

53 

61 

75 

87 

43 

68 

76 

89 

47 

82 

73 

38 

78 

58 


80 

70 


33 



79 



Answer: Using the formulas (17) - (21), we compute the sum of squares 
SS W , SS B and SS T as 

SS W = 10362, SS B = 755, and SS T = 11117. 


The ANOVA table for this problem is shown below. 


Source of 
variation 

Sums of 
squares 

Degree of 
freedom 

Mean 

squares 

F-statistics 

T 

Between 

755 

2 

377 

1.02 

Within 

10362 

28 

370 


Total 

11117 

30 




At the significance level a = 0.05, we find the F-table that F 0 . 05 ( 2 , 28) — 
3.34. Since 

1.02 = T < To. 05 ( 2 , 28) = 3.34 

we accept the null hypothesis that there is no difference in the average grades 
given by the three instructors. 

Note that using a statistical package like MINITAB, SAS or SPSS we 
can compute the p-value to be 

p - value = P(F(2,28) > 1.02) = 0.374. 
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Hence again we reach the same conclusion since p -value is less then the given 
a for this problem. 

We conclude this section pointing out the advantages of choosing equal 
sample sizes (balance case) over the choice of unequal sample sizes (unbalance 
case). The first advantage is that the F-statistics is insensitive to slight 
departures from the assumption of equal variances when the sample sizes are 
equal. The second advantage is that the choice of equal sample size minimizes 
the probability of committing a type II error. 

20.3. Pair wise Comparisons 

When the null hypothesis is rejected using the F-test in ANOVA, one 
may still wants to know where the difference among the means is. There are 
several methods to find out where the significant differences in the means 
lie after the ANOVA procedure is performed. Among the most commonly 
used tests are Scheffe test and Tuckey test. In this section, we give a brief 
description of these tests. 

In order to perform the Scheffe test, we have to compare the means two 
at a time using all possible combinations of means. Since we have to means, 
we need (™) pair wise comparisons. A pair wise comparison can be viewed as 
a test of the null hypothesis Ho : /i, = against the alternative H a : /q ^ pk 
for all 

To conduct this test we compute the statistics 

_ ( 7 ,.- F ,.) 2 

where Y i# and Y are the means of the samples being compared, rq and 
rifc are the respective sample sizes, and MSyy is the mean sum of squared of 
within group. We reject the null hypothesis at a significance level of a if 

F s > (to — 1)F q (to — 1, N — to) 
where N = ni + 11,2 + • • • + n m . 

Example 20.4. Perform the analysis of variance and test the null hypothesis 
at the 0.05 level of significance for the following data given in the table below. 
Further perform a Scheffe test to determine where the significant differences 
in the means lie. 
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Sample 

1 

2 

3 

9.2 

9.5 

9.4 

9.1 

9.5 

9.3 

9.2 

9.5 

9.3 

9.2 

9.6 

9.3 

9.3 

9.5 

9.2 

9.2 

9.4 

9.3 


Answer: The ANOVA table for this data is given by 


Source of 
variation 

Sums of 
squares 

Degree of 
freedom 

Mean 

squares 

F-statistics 

T 

Between 

0.280 

2 

0.140 

35.0 

Within 

0.600 

15 

0.004 


Total 

0.340 

17 




At the significance level a = 0.05, we find the F-table that Fq .05(2, 15) = 
3.68. Since 

35.0 = T > F 0 . 05 (2, 15) = 3.68 

we reject the null hypothesis. Now we perform the Scheffe test to determine 
where the significant differences in the means lie. From given data, we obtain 
Y i. = 9.2, Y 2 . = 9.5 and Y 3 , = 9.3. Since m = 3, we have to make 3 pair 
wise comparisons, namely Mi with n 2 , /i 1 with \x 3 , and /i 2 with [i 3 . First we 
consider the comparison of p-i with /i 2 . For this case, we find 


= (ri.-n.) 2 

MS « (A + ±) 


(9.2 - 9.5) 2 
0-004 (i + |) 


67.5. 


Since 

67.5 = F s > 2F 0 . 05 (2, 15) = 7.36 

we reject the null hypothesis H 0 : fii = H 2 in favor of the alternative F[ a : 
Mi 7 ^ M2- 
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Next we consider the comparison of /ii with \jl 3 . For this case, we find 

p _ ( fi .- y .,.) 2 _ ( 9-2 - 9 - 3) 2 _, , 

Since 

7.5 = F S >2F 0 . 05 (2, 15) = 7.36 

we reject the null hypothesis Hq : /j. 1 = ji 3 in favor of the alternative H a : 
Mi ± M3- 


Finally we consider the comparison of /i 2 with /i 3 . For this case, we find 


= 


(y 2 .-y 3 .) 2 

MS W (— + —) 

vv \n 2 n 3 J 


(9.5-9.3 ) 2 
0-004 (i + l] 


= 30.0. 


Since 

30.0 = F s > 2F 0 . 05 (2, 15) = 7.36 

we reject the null hypothesis H 0 : /j 2 = M 3 i n favor of the alternative i7 a : 
M 2 ± Ms- 


Next consider the Tukey test. Tuckey test is applicable when we have a 
balanced case, that is when the sample sizes are equal. For Tukey test we 
compute the statistics 


Q = 


Y* - F fc# 


MSw 


where Y,. and Yare the means of the samples being compared, n is the 
size of the samples, and MSw is the mean sum of squared of within group. 
At a significance level a, we reject the null hypothesis H 0 if 


\Q\ > Qa(m, v) 


where v represents the degrees of freedom for the error mean square. 

Example 20.5. For the data given in Example 20.4 perform a Tukey test 
to determine where the significant differences in the means lie. 

Answer: We have seen that Y 1 . = 9.2, Y 2 . = 9.5 and Y 3# = 9.3. 

First we compare /ii with \x 2 . For this we compute 


Q = 


Y-|. - Y 2 . _ 9.2-9.3 

/ 0.004 


-11.6189. 


MSw 
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Since 

11.6189= |0| >Qo. 05 (2, 15) = 3.01 

we reject the null hypothesis Hq : h i = M 2 in favor of the alternative H a : 
Mi 7^ M2- 

Next we compare with /i 3 . For this we compute 


Q = 


Yu - Yu 


MSw 


9.2-9.5 


0.004 


-3.8729. 


Since 

3.8729 = |Q| > Q 0 .05(2, 15) = 3.01 

we reject the null hypothesis H 0 : Mi = M 3 i n favor of the alternative H a : 
Mi t L M3- 

Finally we compare /12 with /z 3 . For this we compute 


Q = 


Y 2. - y 3 . 


MSw 


9 . 5 - 9.3 


0.004 


7.7459. 


Since 

7.7459 = |Q| > Q 0 .05(2, 15) = 3.01 

we reject the null hypothesis Hq : M 2 = M 3 i n favor of the alternative H a : 
M2 7^ M3- 

Often in scientific and engineering problems, the experiment dictates 
the need for comparing simultaneously each treatment with a control. Now 
we describe a test developed by C. W. Dunnett for determining significant 
differences between each treatment mean and the control. Suppose we wish 
to test the m hypotheses 


H 0 : Mo = Mi versus H a : /x 0 ^ Mi f° r * = 1,2,..., m, 


where /io represents the mean yield for the population of measurements in 
which the control is used. To test the null hypotheses specified by H 0 against 
two-sided alternatives for an experimental situation in which there are m 
treatments, excluding the control, and n observation per treatment, we first 
calculate 


A = 


Yj, - Y 0 . 
j 2 MSw ' 


i = 1,2,..., to. 
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At a significance level a, we reject the null hypothesis Hq if 

\Dt\ > D*(m, v) 

where v represents the degrees of freedom for the error mean square. The 
values of the quantity D° (jn, v) are tabulated for various a, m and u. 

Example 20.6. For the data given in the table below perform a Dunnett 
test to determine any significant differences between each treatment mean 
and the control. 


Control 

Sample 1 

Sample 2 

9.2 

9.5 

9.4 

9.1 

9.5 

9.3 

9.2 

9.5 

9.3 

9.2 

9.6 

9.3 

9.3 

9.5 

9.2 

9.2 

9.4 

9.3 


Answer: The ANOVA table for this data is given by 


Source of 
variation 

Sums of 
squares 

Degree of 
freedom 

Mean 

squares 

F-statistics 

T 

Between 

0.280 

2 

0.140 

35.0 

Within 

0.600 

15 

0.004 


Total 

0.340 

17 




At the significance level a = 0.05, we find that -Do. 025 (2,15) = 2.44. Since 
35.0 = D> D 0 . 025 ( 2 , 15) = 2.44 


we reject the null hypothesis. Now we perform the Dunnett test to determine 
if there is any significant differences between each treatment mean and the 
control. From given data, we obtain T 0 . = 9.2, T 1 . = 9.5 and T 2 . = 9.3. 
Since m = 2, we have to make 2 pair wise comparisons, namely /r 0 with ji -\, 
and yu 0 with yit 2 . First we consider the comparison of /ro with /j,-|. For this 
case, we find 


D, 


Yu - T 0 . 


9.5-9.2 


8.2158. 


2 MSw 


2 (0.004) 
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Since 

8.2158 = D 1 > £> 0 . 025 ( 2 , 15) = 2.44 

we reject the null hypothesis H 0 : Mi = Mo in favor of the alternative H a : 
Mi Mo- 

Next we find 


D-2 


Y 2. - Y i 


0 * 


2 M Sw 


9.3-9.2 



(0.004) 

6 


2.7386. 


Since 

2.7386 = D 2 > £> 0 . 025 ( 2 , 15) = 2.44 

we reject the null hypothesis H 0 : M 2 = Mo in favor of the alternative H a : 
M 2 7 ^ Mo- 

20.4. Tests for the Homogeneity of Variances 

One of the assumptions behind the ANOVA is the equal variance, that is 
the variances of the variables under consideration should be the same for all 
population. Earlier we have pointed out that the ^-statistics is insensitive 
to slight departures from the assumption of equal variances when the sample 
sizes are equal. Nevertheless it is advisable to run a preliminary test for 
homogeneity of variances. Such a test would certainly be advisable in the 
case of unequal sample sizes if there is a doubt concerning the homogeneity 
of population variances. 

Suppose we want to test the null hypothesis 
-Ho • cr 1 — a 2 — ■ ■ ■ a m 

versus the alternative hypothesis 

H a : not all variances are equal. 

A frequently used test for the homogeneity of population variances is the 
Bartlett test. Bartlett (1937) proposed a test for equal variances that was 
modification of the normal-theory likelihood ratio test. 

We will use this test to test the above null hypothesis H 0 against H a . 
First, we compute the m sample variances Sf, Sf , from the samples of 
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size m,ri 2 , with m + ri 2 + • • • + n m = N. The test statistics B c is 

given by 

m 

(N — to) In Sp — — 1) In Sf 

JD _ _ 1 _ 

! / A 1_ 1 \ 

3 (m-i) I A/ rii — 1 N — m J 

where the pooled variance Sf is given by 

m 

- 1) Sf 

Sf = ^— -= MS W . 

F N — m 

It is known that the sampling distribution of B c is approximately chi-square 
with to — 1 degrees of freedom, that is 

B c ~ X 2 (m - 1 ) 

when {jii — 1) > 3. Thus the Bartlett test rejects the null hypothesis Ho : 
af = a 2 = ■■ ■ a 2 m at a significance level a if 

B c > xi- a ( m ~ 1)> 

where Xi- a ( m ~ 1) denotes the upper (1 — a) 100 percentile of the chi-square 
distribution with to — 1 degrees of freedom. 

Example 20.7. For the following data perform an ANOVA and then apply 
Bartlett test to examine if the homogeneity of variances condition is met for 
a significance level 0.05. 


Data 

Sample 1 

Sample 2 

Sample 3 

Sample 4 

34 

29 

32 

34 

28 

32 

34 

29 

29 

31 

30 

32 

37 

43 

42 

28 

42 

31 

32 

32 

27 

29 

33 

34 

29 

28 

29 

29 

35 

30 

27 

31 

25 

37 

37 

30 

29 

44 

26 

37 

41 

29 

29 

43 

40 

31 

31 

42 
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Answer: The ANOVA table for this data is given by 


Source of 
variation 

Sums of 
squares 

Degree of 
freedom 

Mean 

squares 

F-statistics 

T 

Between 

16.2 

3 

5.4 

0.20 

Within 

1202.2 

44 

27.3 


Total 

1218.5 

47 




At the significance level a = 0.05, we find the F-table that F 0 .05(2 , 44) = 
3.23. Since 

0.20 = T < F 0 . 05 (2 , 44) = 3.23 

we do not reject the null hypothesis. 

Now we compute Bartlett test statistic B c . From the data the variances 
of each group can be found to be 

S\ = 35.2836, Si = 30.1401, Sf = 19.4481, S\ = 24.4036. 
Further, the pooled variance is 


S 2 p = MSw = 27.3. 


The statistics B c is 

m 

(N - m) In S 2 - ~ 1) 


B r = 


i=l 


1 + 


E; 


3 ( m 1 ) \ m — 1 N — m 

\ *=i / 


44 In27.3 - 11 [In35.2836 - In30.1401 - In 19.4481 - In24.4036] 


i _u. ' i _;T i_ i ) 

^3 (4-1) y 12-1 48-4 J 


1.0537 

1.0378 


= 1.0153. 


From chi-square table we find that Xo.9s(3) = 7.815. Hence, since 


1.0153 = B c < Xo. 95 (3) = 7.815, 
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we do not reject the null hypothesis that the variances are equal. Hence 
Bartlett test suggests that the homogeneity of variances condition is met. 

The Bartlett test assumes that the m samples should be taken from 
m normal populations. Thus Bartlett test is sensitive to departures from 
normality. The Levene test is an alternative to the Bartlett test that is less 
sensitive to departures from normality. Levene (1960) proposed a test for the 
homogeneity of population variances that considers the random variables 

Wa = (Yu -F i# ) 2 

and apply a one-way analysis of variance to these variables. If the F-test is 
significant, the homogeneity of variances is rejected. 

Levene (1960) also proposed using F-tests based on the variables 

Wij = | Yij - F j# | , Wa = hr | Yij - Yi. |, and W zj = J\Y~^Y~\. 

Brown and Forsythe (1974c) proposed using the transformed variables based 
on the absolute deviations from the median, that is W tJ = Y l:j — Med{Y it )|, 
where Med(Yi ,) denotes the median of group i. Again if the F-test is signif¬ 
icant, the homogeneity of variances is rejected. 

Example 20.8. For the data in Example 20.7 do a Levene test to examine 
if the homogeneity of variances condition is met for a significance level 0.05. 

Answer: From data we find that Y i« = 33.00, Y 2 » = 32.83, Y 3 , = 31.83, 
and Y 4 , = 33.42. Next we compute W l3 = (Y t) — Y ,.) . The resulting 
values are given in the table below. 


Transformed Data 

Sample 1 

Sample 2 

Sample 3 

Sample 4 

1 

14.7 

0.0 

0.3 

25 

0.7 

4.7 

19.5 

16 

3.4 

3.4 

2.0 

16 

103.4 

103.4 

29.3 

81 

3.4 

0.0 

2.0 

36 

14.7 

1.4 

0.3 

16 

23.4 

8.0 

19.5 

4 

8.0 

23.4 

5.8 

64 

17.4 

26.7 

11.7 

16 

124.7 

34.0 

12.8 

64 

14.7 

0.0 

91.8 

49 

3.4 

0.7 

73.7 
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Now we perform an ANOVA to the data given in the table above. The 
ANOVA table for this data is given by 


Source of 
variation 

Sums of 
squares 

Degree of 
freedom 

Mean 

squares 

F-statistics 

T 

Between 

1430 

3 

477 

0.46 

Within 

45491 

44 

1034 


Total 

46922 

47 




At the significance level a = 0.05, we find the F-table that Fq .05(3 , 44) = 
2.84. Since 

0.46 = T < F 0 . 05 (3 , 44) = 2.84 

we do not reject the null hypothesis that the variances are equal. Hence 
Bartlett test suggests that the homogeneity of variances condition is met. 

Although Bartlet test is most widely used test for homogeneity of vari¬ 
ances a test due to Cochran provides a computationally simple procedure. 
Cochran test is one of the best method for detecting cases where the variance 
of one of the groups is much larger than that of the other groups. The test 
statistics of Cochran test is give by 

max Sf 

1 <£<m 

C : m • 

1 

The Cochran test rejects the null hypothesis H 0 : af = a '2 =■■■ cr^ a,t a 
significance level a if 

C > C a . 

The critical values of C a were originally published by Eisenhart et al (1947) 
for some combinations of degrees of freedom v and the number of groups m. 
Here the degrees of freedom v are 

v = max (rii — 1). 

1 <i<m 

Example 20.9. For the data in Example 20.7 perform a Cochran test to 
examine if the homogeneity of variances condition is met for a significance 
level 0.05. 
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Answer: From the data the variances of each group can be found to be 

Sf = 35.2836, Si = 30.1401, Sf = 19.4481, S\ = 24.4036. 

Hence the test statistic for Cochran test is 

^ 35.2836 35.2836 

_ _ _ _ _ Q 9900 

35.2836 + 30.1401 + 19.4481 + 24.4036 109.2754 

The critical value Co.5(3,11) is given by 0.4884. Since 


0.3328 = C < C 0 . 5 {3, 11) = 0.4884. 


At a significance level a = 0.05, we do not reject the null hypothesis that 
the variances are equal. Hence Cochran test suggests that the homogeneity 
of variances condition is met. 

20.5. Exercises 

1. A consumer organization wants to compare the prices charged for a par¬ 
ticular brand of refrigerator in three types of stores in Louisville: discount 
stores, department stores and appliance stores. Random samples of 6 stores 
of each type were selected. The results were shown below. 


Discount 

Department 

Appliance 

1200 

1700 

1600 

1300 

1500 

1500 

1100 

1450 

1300 

1400 

1300 

1500 

1250 

1300 

1700 

1150 

1500 

1400 


At the 0.05 level of significance, is there any evidence of a difference in the 
average price between the types of stores? 

2 . It is conjectured that a certain gene might be linked to ovarian cancer. 
The ovarian cancer is sub-classified into three categories: stage I, stage II and 
stage III-IV. There are three random samples available; one from each stage. 
The samples are labelled with three colors dyes and hybridized on a four 
channel cDNA microarray (one channel remains unused). The experiment is 
repeated 5 times and the following data were obtained. 
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Microarray Data 

Array 

mRNA 1 

mRNA 2 

mRNA 3 

1 

100 

95 

70 

2 

90 

93 

72 

3 

105 

79 

81 

4 

83 

85 

74 

5 

78 

90 

75 


Is there any difference between the averages of the three mRNA samples at 
0.05 significance level? 

3. A stock market analyst thinks 4 stock of mutual funds generate about the 
same return. He collected the accompaning rate-of-return data on 4 different 
mutual funds during the last 7 years. The data is given in table below. 


Mutual Funds 

Year 

A 

B 

C 

D 

2000 

12 

11 

13 

15 

2001 

12 

17 

19 

11 

2002 

13 

18 

15 

12 

2004 

18 

20 

25 

11 

2005 

17 

19 

19 

10 

2006 

18 

12 

17 

10 

2007 

12 

15 

20 

12 


Do a one-way ANOVA to decide whether the funds give different performance 
at 0.05 significance level. 

4. Give a proof of the Theorem 20.4. 

5. Give a proof of the Lemma 20.2. 

6. Give a proof of the Theorem 20.5. 

7. Give a proof of the Theorem 20.6. 

8. An automobile company produces and sells its cars under 3 different brand 
names. An autoanalyst wants to see whether different brand of cars have 
same performance. He tested 20 cars from 3 different brands and recorded 
the mileage per gallon. 
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Brand 1 

Brand 2 

Brand 3 

32 

31 

34 

29 

28 

25 

32 

30 

31 

25 

34 

37 

35 

39 

32 

33 

36 


34 

38 


31 




Do the data suggest a rejection of the null hypothesis at a significance level 
0.05 that the mileage per gallon generated by three different brands are same. 
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Chapter 21 

GOODNESS OF FITS 
TESTS 


In point estimation, interval estimation or hypothesis test we always 
started with a random sample X -\, X n of size n from a known dis¬ 

tribution. In order to apply the theory to data analysis one has to know 
the distribution of the sample. Quite often the experimenter (or data ana¬ 
lyst) assumes the nature of the sample distribution based on his subjective 
knowledge. 

Goodness of fit tests are performed to validate experimenter opinion 
about the distribution of the population from where the sample is drawn. 
The most commonly known and most frequently used goodness of fit tests 
are the Kolmogorov-Smirnov (KS) test and the Pearson chi-square (y 2 ) test. 
There is a controversy over which test is the most powerful, but the gen¬ 
eral feeling seems to be that the Kolmogorov-Smirnov test is probably more 
powerful than the chi-square test in most situations. The KS test measures 
the distance between distribution functions, while the y 2 test measures the 
distance between density functions. Usually, if the population distribution 
is continuous, then one uses the Kolmogorov-Smirnov where as if the pop¬ 
ulation distribution is discrete, then one performs the Pearson’s chi-square 
goodness of fit test. 

21.1. Kolmogorov-Smirnov Test 

Let X -[, X' 2 ,..., X n be a random sample from a population X. We hy¬ 
pothesized that the distribution of X is F(x). Further, we wish to test our 
hypothesis. Thus our null hypothesis is 


H 0 :X~F(x). 
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We would like to design a test of this null hypothesis against the alternative 
H a -.x^ F(x). 

In order to design a test, first of all we need a statistic which will unbias- 
edly estimate the unknown distribution F(x) of the population X using the 
random sample Xi,X 2 , —,X n . Let X(\) < X( 2 ) < • • • < X( n ) be the observed 
values of the ordered statistics X( 1 ),X( 2 ), X( n y The empirical distribution 
of the random sample is defined as 

if x < X(i), 

if x (fc ) < x < x (k+1 ), for k = 1,2, ...,n - 1, 
if x (n) < x. 

The graph of the empirical distribution function F±(x) is shown below. 




F 4(x) 

1.00 


0.75 


0.50 


0.25 


0 

x m X (2i X (3i x .4i 

Empirical Distribution Function 


For a fixed value of x, the empirical distribution function can be considered 
as a random variable that takes on the values 

„ 1 2 n — 1 n 


First we show that F n (x) is an unbiased estimator of the population distri¬ 
bution F(x). That is, 


E{F n {x)) = F{x) 


(1) 
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for a fixed value of x. To establish (1), we need the probability density 
function of the random variable F n {x). From the definition of the empirical 
distribution we see that if exactly k observations are less than or equal to x, 
then 

F n (x) = - 
n 

which is 

nF n (x ) = k. 

The probability that an observation is less than or equal to x is given by 
F(x). 



Threre are k sample There are n-k sample 
observations each observations each with 

with probability F(x) probability 1-F(x) 


Distribution of the Empirical Distribution Function 


Hence (see figure above) 

P(nF n (x ) = k) = P (^F n (x) = 

= (*) [F(x)] k [l-F(x)} 

for k = 0,1, n. Thus 


nF n {x) ~ BIN(n, F(x)). 
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Thus the expected value of the random variable nF n [x) is given by 

E(nF n (x)) = nF(x ) 
n E(F n (x)) = nF(x ) 

E(F n (x)) = F(x). 

This shows that, for a fixed x, F n (x), on an average, equals to the population 
distribution function F(x). Hence the empirical distribution function F n (x) 
is an unbiased estimator of F(x). 

Since nF n (x ) ~ BIN(n, F(x)), the variance of nF n (x ) is given by 


Var(nF n (x)) = nF(x) [1 — F(x)]. 


Hence the variance of F n (x ) is 


Var(F n (x)) 


F(x) [1 - F(x)} 
n 


It is easy to see that Var{F n (x)) —> 0 as n —> oo for all values of x. Thus 
the empirical distribution function F n (x) and F(x) tend to be closer to each 
other with large n. As a matter of fact, Glivenkno, a Russian mathemati¬ 
cian, proved that F n (x) converges to F[x) uniformly in x as n —> oo with 
probability one. 

Because of the convergence of the empirical distribution function to the 
theoretical distribution function, it makes sense to construct a goodness of 
fit test based on the closeness of F n (x) and hypothesized distribution F(x). 

Let 

D n = max| F n (x) - F(x )|. 

That is D n is the maximum of all pointwise differences \F n (x) — F(x) |. The 
distribution of the Kolmogorov-Smirnov statistic, D n can be derived. How¬ 
ever, we shall not do that here as the derivation is quite involved. In stead, 
we give a closed form formula for P(D n < d). If X- t , X-i ,..., X n is a sample 
from a population with continuous distribution function F(x), then 


P(D n <d) = 


0 

n 

n! TT / du 

i—\ ^^ i—d 


if h < d < 1 


1 


if d > 1 
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where du = du\du2 • • • du n with 0 < u\ < U2 < • • • < u n < 1. Further, 

OO 

lim P(y/n D n < d) = 1 - 2 V(-l) fc - 1 e " 2fc2 d \ 

n —>00 ‘ 

k= 1 

These formulas show that the distribution of the Kolmogorov-Smirnov statis¬ 
tic D n is distribution free, that is, it does not depend on the distribution F 
of the population. 

For most situations, it is sufficient to use the following approximations 
due to Kolmogorov: 

P(y/n D n < d) « 1 - 2e ~ 2nd2 for d > ~^=. 

v n 

If the null hypothesis H 0 : X ~ F(x) is true, the statistic D n is small. It 
is therefore reasonable to reject H a if and only if the observed value of D n 
is larger than some constant d n . If the level of significance is given to be a, 
then the constant d n can be found from 

a = P(D n >d n /H 0 is true) w 2e~ 2nd ». 

This yields the following hypothesis test: Reject H a if D n > d n where 



is obtained from the above Kolmogorov’s approximation. Note that the ap¬ 
proximate value of d \2 obtained by the above formula is equal to 0.3533 when 
a = 0.1, however more accurate value of d \2 is 0.34. 

Next we address the issue of the computation of the statistics D n . Let 
us define 

D+ = ma x{F n {x) - F{x)} 

and 

D n = max{F(a;) - F n (x)}. 

Then it is easy to see that 

D„ = max{D+, Djf}. 

Further, since F n {x (,)) = it can be shown that 

Dt = max < max 

l<i<n 
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and 


D„ = max < max 
Therefore it can also be shown that 
D n = max < max 

l<i<n ^ 

The following figure illustrates the Kolmogorov-Smirnov statistics D n when 
n = 4. 



F ( x d)) - 


i — 1 



Kolmogorov-Smirnov Statistic 


Example 21.1. The data on the heights of 12 infants are given be¬ 
low: 18.2, 21.4, 22.6, 17.4, 17.6, 16.7, 17.1, 21.4, 20.1, 17.9, 16.8, 23.1. Test 
the hypothesis that the data came from some normal population at a sig¬ 
nificance level a = 0.1. 

Answer: Here, the null hypothesis is 


H a : X ~ N(n, a 2 ). 


First we estimate /i and a 2 from the data. Thus, we get 


230.3 


19.2. 


X = 


12 
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and 


4482.01 - ± (230.3) 2 
12-1 


62.17 

11 


5.65. 


Hence s = 2.38. Then by the null hypothesis 


F(x(i)) = P 



x (i) — 19.2\ 
2.38 ) 


where Z ~ 1V(0,1) and i = 1,2, ...,n. Next we compute the Kolmogorov- 
Smirnov statistic D n the given sample of size 12 using the following tabular 
form. 


i 

x (i) 

F ( x d)) 

T? - F ( X U)) 

F ( x (i)) ^ 

1 

16.7 

0.1469 

-0.0636 

0.1469 

2 

16.8 

0.1562 

0.0105 

0.0729 

3 

17.1 

0.1894 

0.0606 

0.0227 

4 

17.4 

0.2236 

0.1097 

-0.0264 

5 

17.6 

0.2514 

0.1653 

-0.0819 

6 

17.9 

0.2912 

0.2088 

-0.1255 

7 

18.2 

0.3372 

0.2461 

-0.1628 

8 

20.1 

0.6480 

0.0187 

0.0647 

9 

21.4 

0.8212 

0.0121 

0.0712 

10 

21.4 




11 

22.6 

0.9236 

-0.0069 

0.0903 

12 

23.1 

0.9495 

0.0505 

0.0328 


Thus 

£>i2 = 0.2461. 

From the tabulated value, we see that dn = 0.34 for significance level a = 
0.1. Since £>12 is smaller than dn we accept the null hypothesis H a : X ~ 
Hence the data came from a normal population. 

Example 21.2. Let X\, X 2 , ■■■, X -\0 be a random sample from a distribution 
whose probability density function is 


f(x) = 


1 if 0 < x < 1 
0 otherwise. 


Based on the observed values 0.62, 0.36, 0.23, 0.76, 0.65, 0.09, 0.55, 0.26, 
0.38, 0.24, test the hypothesis H 0 : X ~ UNIF( 0,1) against H a : X ^ 
UNIF(0, 1) at a significance level a = 0.1. 
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Answer: The null hypothesis is H 0 : X ~ UNIF(0, 1). Thus 


F( x) = 


0 if x < 0 
x if 0 < x < 1 
1 if x > 1. 


Hence 


F { x d)) =x d) for i = 1,2,n. 


Next we compute the Kolmogorov-Smirnov statistic D n the given sample of 
size 10 using the following tabular form. 


i 

x (i) 

F ( x d)) 

in F ( x d)) 

F ( x d)) in 

i 

0.09 

0.09 

0.01 

0.09 

2 

0.23 

0.23 

-0.03 

0.13 

3 

0.24 

0.24 

0.06 

0.04 

4 

0.26 

0.26 

0.14 

-0.04 

5 

0.36 

0.36 

0.14 

-0.04 

6 

0.38 

0.38 

0.22 

-0.12 

7 

0.55 

0.55 

0.15 

-0.05 

8 

0.62 

0.62 

0.18 

-0.08 

9 

0.65 

0.65 

0.25 

-0.15 

10 

0.76 

0.76 

0.24 

-0.14 


Thus 


Dio = 0.25. 


From the tabulated value, we see that dio = 0.37 for significance level a = 0.1. 
Since D 10 is smaller than d w we accept the null hypothesis 


H 0 : UNIF(0, 1). 


21.2 Chi-square Test 

The chi-square goodness of fit test was introduced by Karl Pearson in 
1900. Recall that the Kolmogorov-Smirnov test is only for testing a specific 
continuous distribution. Thus if we wish to test the null hypothesis 

H 0 : X ~ BIN(n,p) 

against the alternative H a : X ^ BIN(n,p), then we can not use the 
Kolmogorov-Smirnov test. Pearson chi-square goodness of fit test can be 
used for testing of null hypothesis involving discrete as well as continuous 
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distribution. Unlike Kolmogorov-Smirnov test, the Pearson chi-square test 
uses the density function the population X. 

Let Xi, X 2 , X n be a random sample from a population X with prob¬ 
ability density function /( x). We wish to test the null hypothesis 

H 0 :X~ /( x) 


against 


H a -.X^ f( X ). 


If the probability density function f(x) is continuous, then we divide up the 
abscissa of the probability density function f(x) and calculate the probability 
Pi for each of the interval by using 

Pi = / f(x) dx, 

J *i-i 

where {xo,X\, is a partition of the domain of the f(x). 



Discretization of continuous density function 


Let Yi, Y 2 ,..., Y m denote the number of observations (from the random sample 
Xi,X 2 , ..>, X n ) is 1 st , 2 nd , 3 rd ,..., m th interval, respectively. 

Since the sample size is n, the number of observations expected to fall in 
the i th interval is equal to npi . Then 

(Yi ~ npt ) 2 


m 

<? = £ 
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measures the closeness of observed Y-, to expected number rip,;. The distribu¬ 
tion of Q is chi-square with m — 1 degrees of freedom. The derivation of this 
fact is quite involved and beyond the scope of this introductory level book. 

Although the distribution of Q for m > 2 is hard to derive, yet for m = 2 
it not very difficult. Thus we give a derivation to convince the reader that Q 
has x 2 distribution. Notice that Yi ~ BIN(n,pi). Hence for large n by the 
central limit theorem, we have 


Thus 


Since 


Y\ -npi 
y/npi (1 ~Pi) 

(Yi - npi ) 2 


N( 0,1). 


x 2 (i)- 


npi (1 -pi) 

(Yi ~ npi) 2 __ (Yi - npi) 2 (Yj - npi) 2 

n(l-pi) 


np i 


npi (1 - Pi) 
we have This implies that 

(Yi - npi) 2 (Yl - npi) 2 


npi 


which is 


n(l -pi) 
(Yi-npi) 2 (n - Y 2 - n + np 2 ) 2 


x 2 (i) 


x 2 (i) 


np i ?zp 2 

due to the facts that Yl + Y 2 = n and pi + p 2 = 1. Hence 

s 

that is, the chi-square statistic Q has approximate chi-square distribution. 
Now the simple null hypothesis 

Ho '■ Pi = PlO, P2 = P20) ' ' • Pm = Pm.O 


is to be tested against the composite alternative 

H a : at least one p, is not equal to p,o for some i. 

Here PiO)P 20 > •••iPmO are fixed probability values. If the null hypothesis is 
true, then the statistic 

q _ (Xi — npio ) 2 
i =1 


npio 
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has an approximate chi-square distribution with to — 1 degrees of freedom. 
If the significance level a of the hypothesis test is given, then 

a = P(Q> Xi- a ( m - !)) 

and the test is “Reject H 0 if Q > Xi- a ( m ~ !)•” Here Xi-a( m ~ 1) denotes 
a real number such that the integral of the chi-square density function with 
to — 1 degrees of freedom from zero to this real number Xi-a( m — 1) is 1 — ct. 
Now we give several examples to illustrate the chi-square goodness-of-fit test. 

Example 21.3. A die was rolled 30 times with the results shown below: 


Number of spots 

1 

2 

3 

4 

5 

6 

Frequency (xt) 

1 

4 

9 

9 

2 

5 


If a chi-square goodness of fit test is used to test the hypothesis that the die 
is fair at a significance level a = 0.05, then what is the value of the chi-square 
statistic and decision reached? 

Answer: In this problem, the null hypothesis is 
H 0 :pi =p 2 = ■■■ =Pa = 

6 

The alternative hypothesis is that not all p,’s are equal to g. The test will 
be based on 30 trials, so n = 30. The test statistic 


q = E 


2=1 


(xj - npiY 
npi 


where p\ = P2 = ■ ■ ■ = Pe = g • Thus 


npi = (30) - = 5 
6 


and 


2=1 

6 

= E 


(Xj - TlPi ) 2 
npi 

(x{ - 5) 2 


= 7 [16 + 1 + 16+ 16 + 9] 
o 
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The tabulated y 2 value for Xo.95(5) is given by 

Xo. 95 (5) = H.07. 

Since 

11.6 = Q> xg. 9B (5) = 11.07 

the null hypothesis H a : p 4 = p 2 = ■ ■ ■ = Pe, = g should be rejected. 

Example 21.4. It is hypothesized that an experiment results in outcomes 
K , L, M and N with probabilities g, yg, ^ and respectively. Forty 
independent repetitions of the experiment have results as follows: 


Outcome 

K 

L 

M 

N 

Frequency 

11 

14 

5 

10 


If a chi-square goodness of fit test is used to test the above hypothesis at the 
significance level a = 0.01, then what is the value of the chi-square statistic 
and the decision reached? 


Answer: Here the null hypothesis to be tested is 

H 0 : p(K) = i p(L) = A P (M) = A p{N) = ? 
The test will be based on n = 40 trials. The test statistic 

4 


e = £ 


k=l 


(x k - npk ) 2 
npk 


{xi - 8) 2 (x 2 - 12) 2 (x 3 - 4) 2 (x 4 - 16) 2 


12 


16 


(11-8) 2 (14-12) 2 (5 — 4) 2 (10-16) 2 


12 


16 


9 4 

8 + 12 


1 36 

4 + 16 


• 95 - 3.958. 
24 


From chi-square table, we have 


Xo. 99 (3) = 11-35. 


Thus 


3.958 = Q< Xo. 99 (3) = 11.35. 
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Therefore we accept the null hypothesis. 

Example 21.5. Test at the 10% significance level the hypothesis that the 


following data 


06.88 

06.92 

04.80 

09.85 

07.05 

69.82 

06.97 

04.34 

13.45 

05.74 

15.74 

00.32 

04.14 

05.19 

18.69 

05.79 

03.02 

09.87 

02.44 

18.99 

07.99 

05.38 

02.36 

09.66 

00.97 


19.06 

06.54 

03.67 

02.94 

04.89 

10.07 

16.91 

07.47 

05.04 

07.97 

02.45 

23.69 

44.10 

01.70 

02.14 

18.90 

05.42 

01.54 

01.55 

20.99 

04.82 

10.43 

15.06 

00.49 

02.81 


give the values of a random sample of size 50 from an exponential distribution 
with probability density function 


f(x\ 0) 




0 


if 0 < x < oo 
elsewhere, 


where 9 > 0. 


Answer: From the data x = 9.74 and s = 11.71. Notice that 


H 0 : A ~ EXP{9). 

Hence we have to partition the domain of the experimental distribution into 
to parts. There is no rule to determine what should be the value of to. We 
assume to = 10 (an arbitrary choice for the sake of convenience). We partition 
the domain of the given probability density function into 10 mutually disjoint 
sets of equal probability. This partition can be found as follow. 

Note that x estimate 6. Thus 

0 = x = 9.74. 


Now we compute the points X\,X 2 ,...,#io which will be used to partition the 


domain of f(x) 


1 f Xl 1 
10 ~ J Xo e e 

= -[ e - f Jo I 

= 1 - e~~^. 


Hence 


X\ =9 In 


= 9.74 In 


= 1.026. 
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Using the value of x ,\, we can find the value of x 2 . That is 


1 

10 



1 

0 


e « 


= e 


X 1 

0 


— e 


X2 

0 


Hence 

x 2 = -0 In ^e~~^ - ^ • 

In general 

( _ *k -1 1 \ 

e 0 - — I 

for k = 1,2,..., 9, and xio = oo. Using these XkS we find the intervals 
Ak = [xk, Xk+i) which are tabulates in the table below along with the number 
of data points in each each interval. 


Interval A t 

Frequency (o,) 

Expected value (e*) 

[0, 1.026) 

3 

5 

[1.026, 2.173) 

4 

5 

[2.173, 3.474) 

6 

5 

[3.474, 4.975) 

6 

5 

[4.975, 6.751) 

7 

5 

[6.751, 8.925) 

7 

5 

[8.925, 11.727) 

5 

5 

[11.727, 15.676) 

2 

5 

[15.676, 22.437) 

7 

5 

[22.437, oo) 

3 

5 

Total 

50 

50 


From this table, we compute the statistics 


10 


2=1 



= 6.4. 


and from the chi-square table, we obtain 


Xo.g(9) = 14.68. 


Since 


6.4 = Q < Xo. 9 (9) = 14.68 
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we accept the null hypothesis that the sample was taken from a population 
with exponential distribution. 

21.3. Review Exercises 

1. The data on the heights of 4 infants are: 18.2,21.4,16.7 and 23.1. For 
a significance level a = 0.1, use Kolmogorov-Smirnov Test to test the hy¬ 
pothesis that the data came from some uniform population on the interval 
(15,25). (Use d A = 0.56 at a = 0.1.) 

2 . A four-sided die was rolled 40 times with the following results 


Number of spots 

1 

2 

3 

4 

Frequency 

5 

9 

10 

16 


If a chi-square goodness of fit test is used to test the hypothesis that the die 
is fair at a significance level a = 0.05, then what is the value of the chi-square 
statistic? 

3. A coin is tossed 500 times and k heads are observed. If the chi-squares 
distribution is used to test the hypothesis that the coin is unbiased, this 
hypothesis will be accepted at 5 percents level of significance if and only if k 
lies between what values? (Use Xo.osW = 3.84.) 

4. It is hypothesized that an experiment results in outcomes A, C, T and G 

with probabilities and |, respectively. Eighty independent repeti¬ 

tions of the experiment have results as follows: 


Outcome 

A 

G 

C 

T 

Frequency 

3 

28 

15 

34 


If a chi-square goodness of fit test is used to test the above hypothesis at the 
significance level a = 0.1, then what is the value of the chi-square statistic 
and the decision reached? 

5. A die was rolled 50 times with the results shown below: 


Number of spots 

1 

2 

3 

4 

5 

6 

Frequency ( Xi ) 

8 

7 

12 

13 

4 

6 


If a chi-square goodness of fit test is used to test the hypothesis that the die 
is fair at a significance level a = 0.1, then what is the value of the chi-square 
statistic and decision reached? 
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6. Test at the 10% significance level the hypothesis that the following data 


05.88 

05.92 

03.80 

08.85 

06.05 

70.82 

07.97 

05.34 

14.45 

06.74 

16.74 

01.32 

03.14 

06.19 

19.69 

04.79 

02.02 

08.87 

03.44 

17.99 

06.99 

05.38 

03.36 

08.66 

01.97 


18.06 

05.54 

02.67 

01.94 

03.89 

11.07 

17.91 

08.47 

06.04 

08.97 

03.45 

24.69 

45.10 

02.70 

03.14 

17.90 

04.42 

01.54 

01.55 

19.99 

03.82 

11.43 

14.06 

01.49 

01.81 


give the values of a random sample of size 50 from an exponential distribution 
with probability density function 


0 ) 


ie-f 


0 


if 0 < x < oo 
elsewhere, 


where 9 > 0. 

7. Test at the 10% significance level the hypothesis that the following data 


0.88 

0.92 

0.80 

0.85 

0.05 

0.06 

0.54 

0.67 

0.94 

0.89 

0.82 

0.97 

0.34 

0.45 

0.74 

0.07 

0.91 

0.47 

0.04 

0.97 

0.74 

0.32 

0.14 

0.19 

0.69 

0.45 

0.69 

0.10 

0.70 

0.14 

0.79 

0.02 

0.87 

0.44 

0.99 

0.90 

0.42 

0.54 

0.55 

0.99 

0.94 

0.38 

0.36 

0.66 

0.97 

0.82 

0.43 

0.06 

0.49 

0.81 


give the values of a random sample of size 50 from an exponential distribution 
with probability density function 

( (1 + 9) x 8 if 0 < x < 1 

/O; 0 ) = < 

[ 0 elsewhere, 

where 9 > 0. 


8. Test at the 10% significance level the hypothesis that the following data 


06.88 

06.92 

04.80 

09.85 

07.05 

19.06 

06.54 

03.67 

02.94 

04.89 

29.82 

06.97 

04.34 

13.45 

05.74 

10.07 

16.91 

07.47 

05.04 

07.97 

15.74 

00.32 

04.14 

05.19 

18.69 

02.45 

23.69 

24.10 

01.70 

02.14 

05.79 

03.02 

09.87 

02.44 

18.99 

18.90 

05.42 

01.54 

01.55 

20.99 

07.99 

05.38 

02.36 

09.66 

00.97 

04.82 

10.43 

15.06 

00.49 

02.81 


give the values of a random sample of size 50 from an exponential distribution 
with probability density function 

if 0 < x < 9 


f(x; 9) = 


0 


elsewhere. 
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9. Suppose that in 60 rolls of a die the outcomes 1, 2, 3, 4, 5, and 6 occur 
with frequencies ni, n 2 , 14, 8, 10, and 8 respectively. What is the least value 
of Xu=i( n * — 10) 2 f° r which the chi-square test rejects the hypothesis that 
the die is fair at 1% level of significance level? 

10 . It is hypothesized that of all marathon runners 70% are adult men, 25% 
are adult women, and 5% are youths. To test this hypothesis, the following 
data from the a recent marathon are used: 


Adult Men 

Adult Women 

Youths 

Total 

630 

300 

70 

1000 


A chi-square gooclness-of-fit test is used. What is the value of the statistics? 
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ANSWERES 

TO 

SELECTED 
REVIEW EXERCISES 


CHAPTER 1 


1912’ 

2. 244. 

3. 7488. 

4 - (a) (b) ^ and (c) 

5. 0.95. 

6. f. 

7. |. 

8. 7560. 

10. 4 3 . 

11 . 2 . 

12. 0.3238. 

13. S has countable number of elements. 

14. S has uncountable number of elements. 

1 ^ - 25 - 
±D ‘ 648’ 


16. (n — l)(n — 2) (i) 

17. (5!) 2 . 


n+1 


18. + 


19. i. 
20 


1 

3’ 
n+1 


3n— 1 - 


21 — 
ZL. u . 

22. i. 
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CHAPTER 2 


i. §. 

(6!) 2 


( 21 ) 6 - 


3. 0.941. 


4. 


5. 


6 . 


4 

5 * 

_ 6 _ 
11 ' 

255 

256 


7. 0.2929. 


8 . 

9. 


10 . 


10 
17’ 
30 
31' 
T_ 
24' 


11 . 


(A)(1) 

(A) (l)+(A) (§)' 


12 . 


(0.01) (0.9) 

(0.01) (0.9) + (0.99) (0.1) ’ 


15 - (a) (I) (^) + (I) (ts) and (b) (j)(i 5 )+(j)( 

16. '. 



18. 5. 
IQ JL 

42- 

20. 
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CHAPTER 3 

1- l 

9 fc+1 
2/c+l ' 

Q _L 

m- 

4. Mode of X = 0 and median of X = 0. 

5. 0 1n(f). 

6. 2 In 2. 

7. 0.25. 

8. /(2) = 0.5, /(3) = 0.2, /(tt) = 0.3. 

9. /(*) = \x*e- x . 

10. f. 

11. a = 500, mode = 0.2, and P(X > 0.2) = 0.6766. 

12. 0.5. 

13. 0.5. 

14. 1 - F(-y). 

15. i. 

16. R x = {3,4,5,6,7,8,9}; 

/( 3 ) = m = /( 5 ) = m = /( 7 ) = m = m = £>■ 

17. R x = {2,3,4, 5, 6,7,8, 9,10,11,12}; 

/(2) = ie, m = i, /(4) = i, /(5) = /(6) = /(7) = /(8) = 

a, m = /(io) = i, /(ii) = i, /(i2) = i 6 . 

18. R x = { 0,1,2,3,4,5}; 

/(0) = , /(1) = /(2) = ?§2, /(3) = fig, /(4) = /(5) = T i F . 

19. R x = {1,2, 3,4, 5,6,7}; 

/(l) = 0.4, /(2) = 0.2666,/(3) = 

0.0476, /(6) = 0.0190, /(7) = 0.0048. 

20. c = 1 and P(X = even ) = 

21. c= |, P(l< X<2)= 

22. c = | and P (X < 


0.1666, /(4) 


0.0952, /(5) 
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CHAPTER 4 


1. -0.995. 

2. (a) i (b) i, (c) |. 

3. (c) 0.25, (d) 0.75, (e) 0.75, (f) 0. 

4. (a) 3.75, (b) 2.6875, (c) 10.5, (d) 10.75, (e) -71.5. 

5. (a) 0.5, (b) 7r, (c) ^ n. 

6 Hi 

24 


E(x 


7. 

8 . 

9. 280. 

10 . — 

2Q . 

11. 5.25. 

12. a = % E(X) = Par(X) =£[§-£],■ 


_ 7 


^cn = i- 


13. F(A) = 

14 — 

38’ 

15. -38. 

16. M(t) = 1 + 2t + 6i 2 H- 

17 - 

is- /? n nr=o(fc+*)- 

19. \ [3e 2 * + e 3t ]. 

20. 120 . 

21. F(A) = 3, Var(X) = 2. 

22 . 11 . 

23. c=E(X). 

24. F(c) = 0.5. 

25. F(A') = 0, Var(X) = 2. 

26. gig. 

27. 38. 


28. a = 5 and b = —34 or a = —5 and b = 36. 


29. -0.25. 

30. 10. 

31. -j^plnp. 
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CHAPTER 5 


1 

16 • 

2 . — 

16 

o 25 

a. 72 . 

a 4375 
** 279936’ 

5. |. 


6. 11 

16’ 

7. 0.008304. 

8 . |. 

9. i. 

10. 0.671. 

11 . 

12. 0.0000399994. 

1 q n 2 — 3n+2 
2 n + 1 ’ 

14. 0.2668. 

(3-fc) (t) 

(a°) ’ 

16. 0.4019. 


15. 


0 < A; < 3. 


17 ■ 1 - 

is. r/) a) 3 (i) 

19. A 

20. 0.22345. 

21. 1.43. 

22. 24. 

23. 26.25. 

24. 2. 

25. 0.3005. 

26 - 

27. 0.9130. 


x —3 


28. 0.1239. 
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CHAPTER 6 


1 . f(x) = e~ x 0 

2 . Y ~ UNIF{ 0,1). 

3 - /H = wfep e " 

4. 0.2313. 

5. 3 In 4. 

6 . 20.1 a. 

7- !• 

8 . 2 . 0 . 

9. 53.04. 

10. 44.5314. 

11. 75. 

12. 0.4649. 


0 < x < oo. 


14. 0.8664. 

i r -k 2 

15. e. 


17. 64.3441. 
18 - <?(*/) = | 

19. 0.5. 

20. 0.7745. 

21. 0.4. 


22. |\y s e 

23 - vky^ 

24. ln(X) - / 

25. e^-^ 2 . 

26. e M . 

27. 0.3669. 


if 0 < y < V2 


otherwise. 


29. Y ~ GBETA(a,f3,a,b). 

32. (i) (ii) 5 , (iii) ( iv ) 

33. (i) T | g , (ii) (100) 13 ^, (iii) gig. 

35. (l-g)’. 

36. E(X n ) = e n r( "+” ) . 
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CHAPTER 7 


!• fi{x) = and f 2 {y) = 


2 - f(x,y) 


3 ^ if 1 < x < y = 2x < 12 

3 g if 1 < x < y < 2x < 12 

0 otherwise. 


3. 

1 

18’ 

4. 

1 

2e 4 ’ 

5. 

1 

3’ 

6. 

ii 

7. 

(e 2 -l)(e- 

e 5 

8. 

0.2922. 

9. 

5 

T 


lo 


10 . A(x) = 


|A8- 


if 0 < x < 1 
otherwise. 


a; 3 ) if 0 < x < 2 
otherwise. 


ii- h{y) = 

f 2y if 0 < y < 1 


1 0 otherwise. 


12. f(y/x) 


r 1 

l+Ai-^-i) 2 

if (x - l) 2 + (y 


lo 

otherwise. 

13. f. 




14. /(y/x) 

= \ 

f ^ if 0 < y < 

2 x < 1 

1 

[ 0 otherwise. 


15. |. 




16. y(io) = 

2 e“ 

~ w - 2e~ 2w . 


17. g(w) = 

(‘ 

w 3 \ 6to 2 

03 y 03 


18. §. 




19. 




20. |. 

6 




21. No. 





I) 2 < 1 


22 . Yes. 

23. X 

24. i. 

25. i. 

26. xe _x . 
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CHAPTER 8 

1. 13. 

2. Cav(X,Y) = 0. Since 0 = /(0,0) ^ /i(0)/ 2 (0) = X and Y are not 

independent. 

Q J_ 

v^- 

4 _I_ 

(1—4t)(l —6t) * 

5. X + Y ~ BIN{n + m,p). 

6. \ (A 2 + Y 2 ) ~ EXP( 1). 

7. M(s,t) = ^ + ^f i. 

n _15 

16 - 

10 . Cot>(X, y) = 0. No. 

11 . a = | and b = |. 

12. CW = -^. 

13. Corr(X,Y) = 

14. 136. 

15. 1 -\/l + p ■ 

16. (1 — p + pe 4 )(l — p+pe - *). 

17. £[l + (n-l)p]. 

18. 2. 


20 . 1 . 
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CHAPTER 9 


1 . 6 . 

2. \{l+x 2 ). 

3. \y 2 . 

4. l+x. 

5. 2x. 

6. y x = and fi Y = np- 

1^ 2+3t/—28y 3 
* 3 l+2y—8y 2 * 

8. fs. 

9. 

10. | a;. 

11. 203. 

12. 15 - i. 

7T 

13. ^(1-z) 2 . 

14. i (l-z 2 ) 2 . 

( for 0 < y < 1 

!5. / 2 (y) = { ' and Var (X\Y = §) = ^ 

( 3(2 ~ y) if 1 < y < 2 

16. i 

17. 180. 


19. 


iE 

6 


A 

12 ’ 


20 . § + 1 . 
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CHAPTER 10 


i- g(y) = 


2 - g(y) = 


3- g(y) = 


4. g(z) = 


5. g{z,x) 


6 . g{y) = 


7. g(z) = 


8 . g(u) = 


\ + 

1 

4 y/y 

for 0 < y < 1 

0 


otherwise. 

3 Vy 

16 my/rn 

for 0 < y < 4 m 

0 


otherwise. 

2 y 

for 0 < y < 1 


0 otherwise. 

Jg {z + 4) for —4 < z < 0 

^ (4 - z) for 0 < 2 < 4 



otherwise. 

for 0 <a:<z <2 + 2 :<oo 


v 0 otherwise, 
pr for 0 < y < y/2 


0 otherwise. 

Z^“ z 

15000 250 25 

_8_2z _ ^z 2 _ z^_ 

15 25 250 15000 


for 0 < 2 < 10 
for 10 < 2 < 20 


4a 2 


In I 


otherwise. 

2a (M~2a) f or 2a < U < OO 

\ U — CL) — 


otherwise. 


9. 

h(y) = ^ 

-22 + 1 
216 ’ 

Z = 1 


r 

4 h 3 

fit p - 

10. 

II 

my/lF Y 

m e 


i 

0 



II 

/ 3 u 

1 350 

9v 
' 350 

11. 




lo 


12. 

«5 

II 

( 2 u 

\ (i+«) 3 

i 

if 0 


, 2 , 3 , 4 , 5 , 6 . 

o/t,2 z 

for 0 < z < oo 
otherwise. 

for 10 < 3u + v < 20, u > 0 
otherwise. 

<U< 00 


0 


otherwise. 


, v > 0 
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13. g(u,v) = 


14. g(u,v) = 


5 \9v 3 -5v?v+3uv 2 +u 3 ] 

— - 32768 - - f° r 0 < 2v + 2u < 3v — 


0 otherwise. 

^ for 0 < u + v < 2v / fw--~3 u < 8 


^ 0 otherwise. 

2 + Au + 2 u 2 if — 1 < u < 0 

15. g x (u) = l 2y/l - Au if 0 < u < \ 


otherwise, 
if 0 < u < 1 


16. gx(u) = 

17. gx(u) = 

18. gx(u) = 


| u 5 if 1 < u < oo 


0 otherwise. 

4rt s — 4u if 0 < u < 1 

0 otherwise. 

2 u~ 3 if 1 < u < oo 


0 


/ W 

6 


19. f(w) = { 


otherwise, 
if 0 < w < 2 

if 2 < w < 3 


^ if 3 < w < 5 


otherwise. 


20. BIN(2n,p) 

21 . GAM{9, 2) 

22. CAU(0) 

23. 2a 2 ) 


24. A (a) = 


z(2-|a|) if|a|<2 


M0) = 


~h MI/31) 


u < 16 


if m < i 


0 


otherwise, 


0 


otherwise. 
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CHAPTER 11 


2 . 


10 ■ 


3. 


960 
7 5 ■ 


6 . 0 . 7627 . 
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CHAPTER 12 

6 . 0 . 16 . 
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CHAPTER 13 

3. 0.115. 

4. 1.0. 

e; J_ 

D. lg . 

6. 0.352. 

7 - I- 

8. 100.64. 

q l+ln(2) 

a. 2 

10. [1 — F(xq)} 5 . 

11 . 0+1 

12 . 2 e~ w [1 - e~ w ]. 

13. 6|j(l-£). 

14. N(0, 1). 

15. 25. 

16. X has a degenerate distribution with MGF M(t) = eM. 

17. PO/(1995A). 

18. (§)"(« + 1). 

19. jjg 35. 

20. fix) = ™ (l — e - ^) 3 e - ^ for 0 < a; < oo. 

21. X( n+1 ) ~ Beta(n + 1, n + 1). 
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CHAPTER 14 

1. N(0, 32). 

2. x 2 (3); the MGF of X? 

3. t( 3). 

4. f(xi,x 2 ,x 3 ) = — 

5. a 2 


- Xl is M(t) = 


+ X 2 +X 3 ) 

e 


1 

Vl-4t 2 ' 


6 . t{ 2). 

7. M(i) = . ' 

v ' yj (1—2t)(l—4t)(l—6t)(l —8t) 

8. 0.625. 

9. £$(n- 1). 

10 . 0 . 

11. 27. 

12. X 2 (2?i). 

13. t(n + p). 

14. x 2 N- 

15. (1,2). 

16. 0.84. 

17. Nr- 

18. 11.07. 

19. x 2 (2n-2). 

20. 2.25. 


21. 6.37. 
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CHAPTER 15 


1. 


\ 


1 

n / j i 


2 . 1 

3. 


x-i' 
2 


X ' 

4.- 5 


^TlnXi 


i=1 


5. 


- 1 . 




i=l 

6 . — 

A 

7. 4.2 


o 19 
26’ 

n 15 

j. 4 . 


10 . 2 . 

11. a = 3.534 and /3 = 3.409. 

12 . 1 . 

13. j max{a;i, X 2 , ..., x n }. 


14. 


1 - 


max{a?i,X 2 ,...,x n } ’ 


15. 0.6207. 
18. 0.75. 


19. 


-1 + 


5 

ln(2) • 


20 . 


21 . 


A 

1+A 

X 
4 ’ 


22 . 8 . 


n 


i =1 


23. 
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\ _ nX Q j - _ nl 
A — (n-l)S 2 dllU a ~ (n-l)S 2 ’ 


10 n 

p(l-p)' 


S=f, 

8 is obtained by solving numerically the equation ]T" =1 ^^7-e) 2 = 0- 

9 is the median of the sample. 


(i-p) p 2 • 

9 = 3X. 


0 _ 50 v 
" ~~ 30 ' 
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CHAPTER 16 

i r g-|-cot'(Ti,T 2 ) 

<tJ+<t|-2coi»(Ti,T 2 ’ 

2. 0 = Ixf, E(fxf) = e, unbiased. 

4. n = 20. 

5. fc= i. 

6 . a = |f, 6= |f, c= 12.47. 

n 

7 ■ E* 3 - 

2=1 

n 

8. E X ?’ na 

2=1 

9. k = -. 

7T 

10. k = 2. 

11. fc = 2. 

n 

13. Inj](l + X 4 ). 

2=1 

n 

14. E^ 2 - 
1 — 1 

15. Afr), and sufficient. 

16. X(x) is biased and X — 1 is unbiased. X(i) is efficient then X — 1. 

n 

17. ^TlnX,. 

2=1 

n 

18. Y, X i- 

2=1 

n 

19. In Xj. 

i=i 

22. Yes. 


23. Yes. 
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24. Yes. 

25. Yes. 

26. Yes. 
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CHAPTER 17 


r. The pdf of Qis <,(,) = 

t () otherwise. 


The confidence interval is A (1) - ^ In (f) , X ( i) - f- In ( 2 =^) ■ 

8 . The pdf of Q is g(q) = { \ e ~^ q if 0 < q < 00 

10 otherwise. 

The confidence interval is X (1) - £ In (f) , X {1) - f- In ■ 


9. The pdf of Q is g{q) = j ” q 


if 0 < q < 1 
otherwise. 


The confidence interval is A'^ — ^ In (£) , — f- In ( 2 ^ 0 ) ■ 

10. The pdf g(q) of Q is given by g(q) = { " ^ if ° ^ 1 

^ 0 otherwise. 

The confidence interval is (f) n X( ra ), ( 2 ^)" -^(n) • 


11. The pdf of Q is given by g{q) = 


n (n — 1) q n 2 (1 — q) if 0 < q < 1 
0 otherwise. 


12 . X (1) -z f ^, X (1) + z*^ . 

13 - H W’ w] ’ where * = ~ 1+ ElC 1 ^ ' 

14 - 

15. [Z-4-za ^4, X-4 + .za ^ . 

2 V n ’ 2 V n 

1 fi y, , _ Ani y 1 X(n) 

AU ’ L (") 2 (71+1) yrt+2’ ( n ) ' r 2 (ra+1) x ^+2_ ' 


17. iA + ^^4= . 

4 28 v^ ’ 4 28 
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CHAPTER 18 

1. a = 0.03125 and /3 = 0.763. 

2. Do not reject H 0 . 

3. a = 0.0511 and /3(A) = 1 - ^^77-’ A ^ °- 5 - 

4. a = 0.08 and /? = 0.46. 

5. a = 0.19. 

6. a = 0.0109. 

7. a = 0.0668 and j3 = 0.0062. 

8. C = {(# 1 , 2 : 2 ) | x 2 > 3.9395}. 

9. C = {(ori,...,xio) \x> 0.3}. 

10 . C = {x € [0,1] \x > 0.829}. 

11. C = {(x \, x 2 ) | x\ + x-i > 5}. 

12. C = {(xi,...,a; 8 ) \x-x\nx< a}. 

13. C = {(x\y...,x n ) 1351nx — x < a}. 

14. C= |(®i,...,® 5 ) I ( 5 ^= 2 ) x 5 <a|. 

15. C = {(xi,X 2 ,X 3 ) | \x— 3| > 1.96}. 

16. C = |(a;i, X 2 , X 3 ) | xe~i x < a|. 

17. C= {{ Xl ,x 2 ,...,x n ) | ( T f 1 ) 3i <a|. 

18. i. 

19. C= {(x 1 ,x 2 ,x 3 ) I x (3) < \ZlV 7 }. 

20. C = {(xi,X 2 ,X 3 ) | x > 12.04}. 

21- a= je and (3 = §§§. 


22. a = 0.05. 
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CHAPTER 21 

9- Ei =M - 10) 2 > 63.43. 

10. 25. 



